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Abstract  ') 

Regression-based  heteroskedasticity  and  serial  correlation  robust 
standard  errors  and  specification  tests  are  proposed  for  linear  models  that 
may  not  represent  an  expectation  conditional  on  all  past  information.   The 
statistics  are  computable  via  a  sequence  of  linear  regressions,  and  the 
procedures  apply  to  models  estimated  by  ordinary  least  squares  or  two  stage 
least  squares.   Examples  of  the  specification  tests  include  tests  for 
nonlinearities  in  static  models,  exclusion  restriction  tests  in  finite 
distributed  lag  models,  heteroskedasticity/serial  correlation-robust  Chow 
tests,  tests  for  endogeneity,  and  tests  of  overidentifying  restrictions. 
Some  new  tests  of  the  assumptions  underlying  Cochrane-Orcutt  estimation  are 
also  proposed,  and  some  considerations  when  applying  the  various  robust  tests 
are  discussed. 


1.  Introduction 

Work  by  Hansen  (1982),  Domowitz  and  VThite  (1982),  White  (1984)  and,  more 
recently,  by  Newey  and  West  (1987),  Gallant  (1988),  and  Gallant  and  White 
(1988) ,  has  provided  general  methods  for  performing  inference  in  econometric 
models  that  may  be  dynamically  incomplete.   A  simple  example  of  a  dynamically 
incomplete  model  is  a  static  regression  model  with  neglected  serial 
correlation.   As  discussed  by  Hansen  and  Hodrick  (1980)  and  Hansen  (1982)  , 
more  complicated  examples  arise  in  rational  expectations  models  when  the  time 
interval  relevant  for  decision  making  by  economic  agents  differs  from  the 
sampling  interval.   In  these  models  the  implied  errors  are  not  martingale 
difference  sequences  but  moving  averages  of  a  particular  order.   The 
generalized  method  of  moments  procedures  unified  by  Hansen  (1982)  are  used 
regularly  in  these  rational  expectations  -  type  applications. 

Serial  correlation  robust  procedures  have  been  much  slower  to  catch  on 
for  standard  linear  regression  applications.   There  are  probably  several 
reasons  for  this,  the  foremost  being  the  availability  of  a  competitor  that  is 
implemented  in  all  regression  packages,  namely  the  simple  AR(1)  model  that 
can  be  estimated  by  the  Cochrane-Orcutt  technique.   Estimating  a  static  or 
finite  distributed  lag  model  with  an  AR(1)  correction  is  now  as  easy  as 
estimating  the  model  by  OLS .   On  the  other  hand,  heteroskedasticity/serial 
correlation-robust  (H/SC- robust)  covariance  matrix  estimators  suggested  by 
Domowitz  and  White  (1982)  and  Newey  and  West  (1987)  are  more  difficult  to 
compute,  and  many  regression  packages  do  not  report  them.   Currently 
available  H/SC-robust  specification  tests  of  more  than  one  degree-of  freedom 
introduce  even  more  complications.   Specification  testing  in  the  AR(1)  model 
is  straightforward  because  all  statistics  can  be  computed  as  standard  F-tests 


based  on  quasi-differenced  data. 

Certain  limitations  of  the  AR(1)  model  have  been  stressed  by  many 
authors;  for  a  summary  and  references  see  Hendry,  Pagan,  and  Sargan  (1984) 
(hereafter  HPS  (1984)).   As  emphasized  by  these  authors,  one  way  to  view  the 
static  model  with  AR(1)  errors  is  as  a  restricted  version  of  a  general 
dynamic  model.   The  restrictions  on  the  dynamic  model  have  become  known  as 
"common  factor"  restrictions,  and  various  tests  of  these  restrictions  are 
available  (e.g.  HPS  (1984)  and  Harvey  (1981)).   If  the  common  factor 
restrictions  are  violated,  then  Cochrane -Orcutt  type  estimators  may  be 
inconsistent  for  the  parameters  appearing  the  in  static  relationship.   At  a 
minimum,  one  would  have  to  question  the  validity  of  the  usual  OLS  test 
statistics  based  on  quasi-differenced  data. 

By  focusing  on  the  common  factor  restrictions  one  necessarily  adopts  the 
viewpoint  that  the  dynamic  expectation  is  of  primary  importance.   As  argued 
in  section  2,  the  validity  of  the  common  factor  restrictions  in  the  dynamic 
regression  is  neither  necessary  nor  sufficient  for  methods  such  as 
Cochrance-Orcutt  (C-0)  or  nonlinear  least  squares  (NLS)  to  consistently 
estimate  the  parameters  in  the  static  relationship.   Consequently,  other 
testing  procedures  are  needed  to  assess  whether  C-0  estimates  are  consistent 
for  the  parameters  of  the  static  relationship.   Some  new  tests  useful  in 
this  regard  are  offered  in  in  section  4. 

An  alternative  to  the  static  plus  AR(1)  model  is  to  correct  the  OLS 
standard  errors  for  serial  correlation,  and  to  compute  H/SC-robust 
specification  tests.   The  primary  purpose  of  this  paper  is  to  offer  forms  of 
these  statistics  that  can  be  computed  by  virtually  any  regression  package. 
Thus,  these  techniques  are  only  modestly  more  costly  than  Cochrane-Orcutt  in 


terms  of  computation,  while  being  more  robust  and  more  widely  applicable. 

Section  2  of  the  paper  reviews  limiting  distribution  results  for  a 
linear  model  with  heteroskedasticity  and/or  serial  correlation  of  unknown 
form.   The  computationally  simple  H/SC- robust  standard  errors  suggested  by 
Wooldridge  (1989)  are  presented  and  extended.   The  static/AR(l)  model  is  also 
presented  under  the  assumptions  most  useful  for  the  current  analysis. 
Section  3  develops  regression-based  H/SC-robust  specification  tests,  along 
with  several  examples.   Some  considerations  when  applying  these  tests  are 
discussed  in  section  4,  along  with  some  heteroskedasticity-robust  tests  for 
common  factor  restrictions.   Section  5  contains  the  methods  appropriate  for 
two  stage  least  squares  estimation,  and  section  5  contains  some  suggestions 
for  future  research. 

2.  Background  and  Motivation 

Let  ((y  ,z  ):  t=... -1,0,1 )  be  a  strictly  stationary,  ergodic  time 

series,  where  y   is  a  scalar  and  z   is  a  IxJ  vector.   Due  to  the  work  of 
■^t  t 

White  (1984)  and  others,  it  is  well-known  that  strict  stationarity  can  be 
relaxed  by  imposing  mixing  or  other  weak  dependence  requirements.   However, 
the  dependence  and  moment  conditions  imposed  typically  rule  out  integrated 
processes  or  series  with  deterministic  trends;  therefore,  there  are  no 
practical  consequences  of  the  strict  stationarity  assumption  when  analyzing 
correctly  specified  models  with  weakly  dependent  series  that  have  some 
bounded  moments.   Unit  root  processes  are  ruled  out  in  what  follows  because 
some  of  the  specification  tests  would  have  nonstandard  limiting 
distributions.   Although  not  treated  explicitly,  series  with  deterministic 
polynomial  trends  are  easily  handled  in  the  usual  manner  by  including  an 


appropriate  polynomial  trend  in  the  estimation  (so  that  the  data  are 
appropriately  detrended) .   In  all  of  the  subsequent  calculations 
(particularly  auxiliary  regressions) ,  the  functions  of  time  can  be  used  in 
the  same  manner  as  the  stationary  regressors. 

In  a  time  series  context,  there  are  several  relationships  between  y   and 
z   that  one  might  be  interested  in.   The  simplest  model  relating  economic 
time  series  is  the  static  model,    which  focuses  on  the  contemporaneous 
relationship  between  y   and  z  ,  ignoring  any  dynamic  aspects.   In 
particular,  interest  centers  on  the  conditional  expectation  E(y  |z  ),  so  that 
the  static  model  is  similar  in  spirit  to  cross  section  regression  models. 
However,  due  to  the  dependent  nature  of  the  data,  the  errors  in  static  time 
series  regression  models  display  serial  correlation. 

Assuming  linearity  of  the  conditional  expectation,  the  static  model  can 
be  written  as 

(2.1)  E(y^iz^)  =  Q  +  z^5,     t=1.2,... 
or,  in  error  form, 

(2.2)  y   =Q+z5+u,   E(u  Iz  )  =  0,     t=l , 2 , 

-'t        t     t      t'  t^     ' 

Estimating  a  model  for  E(y  | z  )  is  reasonable  if  one  is  interested  in  the 

contemporaneous  effect  of  z   on  y  .   The  researcher  must  decide  if  the 

conditional  expectation  (2.1)  is  of  interest.   Sometimes  y  and  z  are  more 

properly  viewed  as  being  jointly  determined,  in  which  case  5    is  still 

well-defined  but  not  of  much  interest. 

Except  for  standard  regularity  conditions,  E(u  |z  )  =0  is  sufficient 

for  the  ordinary  least  squares  estimator  to  be  consistent  for  a  and  5.   In 

particular,  there  is  no  need  to  assume 


(2.3) 


E(u  l...,z   T,z  ,z  -,,...)    =0, 
t'    '  t+1'  t   t-1 


which  is  a  strict  exogeneity  assumption  on  ( z  )  and  operationally  the  same  as 

assijming  nonrandomness  of  {z  ).   Further,  even  when  the  errors  (u  )  contain 

substantial  serial  correlation  OLS  estimates  are  generally  consistent  and 

asymptotically  normally  distributed. 

Another  relationship  frequently  of  interest  to  economists  is  the 

distributed  lag  of  y  on  z.   This  relationship  allows  one  to  trace  the  pattern 

of  the  dynamic  effects  of  a  change  in  contemporaneous  z  on  subsequent  values 

of  y.   The  expecation  of  interest  is  the  expectation  of  y   given  the  current 

and  past  values  of  z,  E(y  Iz  ,z   ,,...)•   If  it  is  assumed  that  the  effect  of 
^  ^-^t'  t   t-1     ' 

z   .  on  y   is  zero  for  i  >  Q  then 
t-j     ^t  -J    ^ 

1    •••     t-Q"Q 


(2.4)      E(y^|z^,z^_^ )  =  a  +  z^£q  +  z^.^5,  +  ...  +  z^  ^S 


or 

(2.5)  y  =  Q  +  z  6_  +  z   ,5^  +  ...  +  z   ^5^  +  u  ,   E(u  Iz  ,z   ,,...)  =  0. 
-^t        to    t-1  1  t-Q  Q    t      t'  t   t-1'    ' 

From  a  statistical  viewpoint  the  error  assumption  in  (2.5)  could  be  replaced 

by  the  weaker  requirement 

(2.6)  ^(""tl^t ^t-Q^  '   °' 

but  then  the  5.  would  be  more  difficult  to  interpret.   In  what  follows  a 
J 

finite  distributed  lag  model  is  defined  by  (2.4)  or  (2.5). 

As  with  the  static  model,  the  strict  exogeneity  assumption  (2.3)  is  not 
required  for  OLS  to  consistently  estimate  5.,  6..,  ....  and  5  .   Also,  the 
errors  (u  )  in  (2.5)  will  generally  exhibit  serial  correlation  and 
heteroskedasticity . 

The  static  and  finite  distributed  lag  models  are  special  cases  of  the 
statistical  model 


(2.7)  y^  =  x^/9  +  u^,   E(u^|x^)  -  0,   t-1.2 


whe 


re  X   is  a  IxK  subvector  of  (l,z  ,y   -i  ■  z   ..,...)  (the  lag  lengths 

appearing  in  x   are  necessarily  invariant  across  t) .   Before  proceeding  to 

the  statistical  analysis  of  (2.7),  it  is  important  to  stress  that,  of  the 

models  discussed  so  far  --  the  static,  distributed  lag,  and  general  dynamic 

models  (i.e.  x   contains  lags  of  y  as  well  as  lags  of  z)  --  none  is 

necessarily  the  "true"  model  or  the  "true"  data  generating  mechanism.   The 

models  simply  correspond  to  different  conditional  expectations  of  the 

variable  y    Ideally,  economic  theory  specifies  whether  E(y  |z  ), 

E(y^|z  ,z     ...),  E(y  |y    z    ,...),  or  some  other  expectation  is  the  one 

of  interest.   For  example,  rational  expectations  places  restrictions  on 

expectations  of  the  form  E(y  ly   .,z   . ,y   .  ,,z   .,,...)  for  some  integer  i 
^  ^-'t'^t-j'  t-j'-^t-j-l'  t-j-1'    '  ^        -^ 

>  1.   But  much  of  the  time  it  is  up  to  the  researcher  to  specify  which 

relationship  is  of  interest. 

In  the  general  model  (2.7),  the  law  of  iterated  expectations  implies 

that  X   and  u   are  uncorrelated:   E(x' u  )  =  E[E(x'u  Ix  )]=  E[x'E(u  Ix  )]  =  0. 
t       t  ^  t  t      ^    t  t '  t  •■    ^  t    t '  t  ■' 

Importantly,  this  is  true  whether  or  not  x   contains  lagged  dependent 
variables  and  whether  or  not  (u  )  is  serially  correlated.   Provided  that 
E(y^|x  )  is  of  interest,  OLS  consistently  estimates  the  coefficients 
appearing  in  this  expectation  under  general  regularity  conditions.   Because 
these  conditions  are  covered  in  detail  by  White  (1984)  ,  as  a  starting  point 
it  is  assumed  that  the  OLS  estimator 

T     >  -1   ,  T 

-  I 
t=l 


-1 
(2.8)  ^  =  (X'X)   X'Y  =  /3  + 


^  y  x'x    T  ^  I  x'u 

'^,  t  t  '^^  t  t 


T 

t=l 


is  asymptotically  normally  distributed.   More  precisely, 
(2.9)  A(/0  -  /S)   -  N(0,a"^BA'^), 


where 


A  -  E(X'X/T)  -  E(x;.x^), 


B  -  lim  V 
T-+00 


T 
,-1/2 


T  "^  y  x'u   -  n„  +  y  n.  +  Q'.  , 
^    t-1    ^  J-1   -^    -^ 


and 


n.  sE(s   -s'),   s   =x'u. 

The  structure  of  B  reveals  that,  although  the  serial  correlation  structure  of 

{s   E  x' u  )  does  not  affect  the  ability  of  OLS  to  estimate  E(y  Ix  ),  it  does 
t    t  t  •'  -' t '  t 

manifest  itself  in  the  limiting  distribution  of  the  OLS  estimator. 
Nevertheless,  provided  one  has  consistent  estimators  A  of  A  and  B  of  B,  in 
practice  one  carries  out  inference  on  fi   as  if 

"P  ~  w(^,a'-'"ba'-'"/t)  ." 


A    T  A  A 


■1-1  "     -1    -1 

A  BA  /T  is  an  estimator  of  the  asymptotic  variance  of  ;9,  A   BA  /T.   A 

A 

consistent  estimator  of  A  in  the  present  context  is  simply  A  ^  X'X/T. 
Estimation  of  B  is  generally  more  difficult  because,  as  seen  above,  B 
generally  depends  on  the  autocorrelation  and  variance  structure  of  (x'u  : 
t=l , 2 , . . . ) .   Before  proceeding  to  the  general  case  it  is  useful  to  recall  the 
conditions  that  provide  asymptotic  justification  for  the  use  of  the  usual 
t-statistics  and  F-statistics . 

The  appropriate  no  serial  correlation  assumption  with  possibly  random 
regressors  is 

(2.10)  E(u   .u  Ix   . ,x  )  =  0,  j  >  1. 

t+j  t'  t+j   t       -^ 

Although  (2.10)  implies  that  the  (u  )  are  unconditionally  uncorrelated,  i.e. 
E(u   u  )  =  0,  j  >  1,  this  latter  condition  is  generally  insufficient  for  the 


usual  test  statistics  to  be  approximately  valid  (unless  the  x   are  treated  as 
nonrandom) .   The  important  consequence  of  (2.10)  is 

Q.    -   E(x'  .u   .u  X  )  -  0, 
J      t+j  t+j  t  t'     ' 

which  follows  from  (2.10)  by  a  straightforward  application  of  the  law  of 
iterated  expectations.   Consequently,  under  (2.10)  B  reduces  to  the  simple 
formula 


i 

B  =  T""'"  y  E(u^x'x  )  ■=  E(u^x'x  ) 
^,    t  t  t       t  t  f^ 


T 

I 
t=l 

The  appropriate  homoskedasticity  assumption  with  possibly  random 

regressors  is 

(2.11)  E(uJ|x^)  =  o^. 

As  with  serial  correlation,  (2.11)  imposes  conditional  homoskedasticity  on  u 

2 
(°^  Yf-)  ;  it  is  not  enough  that  E(u  )  be  constant  across  t  (which  is  always 

the  case  here  by  stationarity) .   If  x   contains  lagged  dependent  variables 

then  (2.11)  rules  out  certain  dynamic  forms  of  heteroskedasticity ,  such  as 

Engle's  (1982)  ARCH  model. 

If  (2.11)  holds  in  addition  to  (2.10)  then  another  application  of  the 

2  2 

law  of  iterated  expectations  gives  B   =   a    E(x' x  )    =  a   A,    and  the  asymptotic 

variance  of  /T(/3  -  P)    reduces  to  the  well-known  formula 

AV  Aip^    -  j3)    =  a^[E(X'X/T)]'^. 

2 
Estimating  a      by  the  usual  degrees -of -  freedom  adjusted  estimator 

T 

(2.12)  o^   =  (T-K)'^  I    (y   -  x  /3  )^  =  SSR/(T-K) 

t=l 

produces  the  usual  standard  errors  and  test  statistics,  which  are 
asymptotically  valid  under  (2.7),  (2.10)  and  (2.11). 


The  homoskedasticity  assumption  (2.11)  is  a  convenience  that  can  never 

be  guaranteed  to  hold  a  priori:   a  model  for  E(y  |x  )  by  definition  imposes 

no  restrictions  on  V(y  |x  ).   In  contrast,  there  is  one  well-knovm  case  where 

the  no  serial  correlation  assumption  is  satisfied.   Suppose  that  u   is 

unpredictable  given  x   and  all  past  information  i^   ,  = 

(y^_^,x^_^,y^_2,x^_2 )  (equivalently,  <;i!.^_^  can  be  taken  to  be 

(u   , ,x   , ,u   „,x   -,...)).   More  formally,  the  condition  is 
^  t-1'  t-1'  t-2 '  t-2  ^ 

(2.13)  E(u^|x^,,^^_^)  -  0. 
which  implies  that 

(2.14)  E(y^|x^,,^^_^)  ■=  E(yjx^)  -  x^^. 

When  (2.14)  holds,  x   contains  enough  lags  of  y  and/or  z  (which  in  principle 

could  be  no  lags  of  y  or  z)  so  that  additional  lags  do  not  help  to  predict 

y  ;  if  this  is  the  case,  model  (2.7)  is  said  to  be  dynamically  complete . 

Dynamic  completeness  is  easily  seen  to  imply  the  no  serial  correlation 

assumption  (2.10):   because  (x   .,u  ,x  )  C  (x  .  ,d)      .  ,  )  ,  it  follows  that 
^  t+j   t   t      t+j   t+j-1 

E(u  .  \x     4>      .      )  -  0;  the  law  of  iterated  expectations  then  implies 

U~rJ    U    t"rj  -  1 

(2.15)  '    E(u   .Ix   .,u  ,x  )  =  0,   all  1  >  1, 

t+j  '  t+j   t   t  -^ 

2 
which  implies  (2.10)  and  B  =  E(u  x' x  ) .   If  heteroskedasticity  is  present, 

the  usual  covariance  matrix  estimator  must  be  modified.   The  UTiite  (1980) 

heteroskedasticity-robust  covariance  matrix  estimator  is  easily  shown  to  be 

consistent  in  time  series  applications  with  no  serial  correlation;  see  also 

Hsieh  (1983). 

If  the  model  is  not  dynamically  complete  then  (2.10)  typically  fails  and 

the  usual  and  White  covariance  matrix  estimators  are  inconsistent.   For 


static  (and  finite  distributed  lag)  models,  a  popular  procedure  when  serial 
correlation  is  detected  is  to  assume  that  the  errors  follow  an  AR(1)  process. 
As  it  is  usually  analyzed,  this  model  can  be  expressed  as     ' 

(2.16)  y^  =  a  +  z^S  +  u^.   E(u^|z^)  -  0,   t-1 . 2 

(2.17)  u^  =  pu^_^  +  e^,   E(e^|z^,«i^_^)  =  0,   t=l .  2 ,  .  .  .  ,  |p|<l, 

with  the  additional,  although  less  important,  homoskedasticity  assumption 

2  2 

E(e  \z    ,d)      .,  )  =  a  .   In  section  4  it  will  be  shown  that  these  assumptions  on 
t'  t^t-1^     e  ^ 

e   are  in  fact  stronger  than  needed  for  the  Cochrane -Orcutt  method  to 
consistently  estimate  P   =  (a, 5')' .   However,  assumptions  (2.16)  and  (2.17) 

2  2 

(and  E(e  \z    ,6      , )  =  a  )  ensure  that  the  usual  statistics  based  on 
t'  t^t-1     e 

quasi-differenced  data  (with  the  estimated  p)    are  valid.   Also,  these  are  the 
assumptions  underlying  the  usual  LM  test  for  H  :  p  =  0.   The  homoskedasticity 
assumption  is  less  important  because  it  can  be  shown  that  using 
heteroskedasticity-corrected  test  statistics  in  the  quasi-differenced 
regressions  produces  valid  test  statistics.   (This  follows  because,  letting  /3 
and  p    denote  the  C-0  or  NLS  estimators  of  ^  =  (a, 6')'     and  p,  (2.16)  and 

(2.17)  ensure  that  the  limiting  distribution  of  /T(/3  -  P)    does  not  depend  on 
that  of  /T(p  -  p) . ) 

It  is  important  to  observe  that  (2.16)  is  a  model  of  the  static 
conditional  expectation  E(y  |z  );  (2.17)  then  implies  a  particular  form  for 
the  dynamic  conditional  expectation: 

(2.18)  E(y^|z^,^^_^)  =  (l-p)a  +  z^S   +  pCy^.^  -  ^^-1^^'  ^=1 • 2  - •  •  • • 

Letting  x   =  (l,z  ,y     z    )  shows  that  (2.18)  can  be  expressed  as  a  dynamic 
linear  model  such  that  E(y  |x  )  =  E(y  \x    ,4>        ),  i.e.  (2.18)  is  dynamically 
complete.   Also,  note  that  if  (2.16)  and  (2.17)  hold,  and  p   ^   0,    then  (by  the 
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law  of  iterated  expectations)  it  must  be  the  case  that 

(2.19)  E(u^|z^^^)  -  0; 

this  is  a  type  of  exogeneity  condition  on  the  explanatory  variables  that  is 
imposed  by  the  AR(1)  model.   In  section  4,  this  condition  is  shown  to  be 
critical  for  C-0  to  produce  consistent  estimates  of  S. 
An  unrestricted  version  of  (2.18)  is 

(2-20)         y^  =-0  ^  ^t^O  ^  ^l^t-l  ^  ^t-1^1  ^  \' 

and  the  conunon  factor   restrictions  are  embodied  in  the  J  nonlinear 

constraints 

(2.21)  5^  =  -p^Sq. 

Under  (2.16)  and  (2.17),  the  regression 

y   on   1 ,  z  ,  y   , ,  z   , 
■'t       '   t  ^t-1'   t-1 

consistently  estimates  5  -  5.  as  the  coefficient  vector  on  z  .   However,  it 

is  important  to  see  that  (2.20)  and  (2.21)  can  hold  with  5_  bearing  no 

resemblance  to  5 .   As  an  example,  suppose  that  5-  =  5..  -  0  in  (2.20),  so  that 

(2.21)  is  trivially  true,  while  5  in  (2.16)  is  different  from  zero.   Nothing 

about  (2.16),  (2.20),  and  (2.21)  rules  out  this  possibility.   On  the  other 

hand,  a  rejection  of  (2.21)  in  the  context  of  model  (2.20)  tells  one  nothing 

about  the  static  relationship;  E(y  |z  )  is  still  well-defined  and  potentially 

of  interest. 

In  fact,  posing  the  AR(1)  model  as  (2.16)  and  (2.17)  suggests  that  at 

least  some  interest  lies  in  the  vector  5  describing  the  contemporaneous 

relationship  between  y   and  z  .   The  AR(1)  assumption  justifies  the  use  of 

the  Cochrane-Orcutt  method  to  obtain  standard  errors,  t-statistics ,  and  other 
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test  statistics  that  have  the  usual  interpretations  and  are  asymptotically 

2 
optimal  (assuming  that  E(e  |z  ,<^   ^)  is  constant).   In  most  of  the  common 

factor  literature  --  e.g.  Sargan  (1964,1980),  Hendry  and  Mizon  (1978), 

Hendry,  Pagan,  and  Sargan  (1984)  --  the  existence  of  common  factors  such  as 

those  implied  by  the  AR(1)  model  is  interpreted  as  implying  that  the 

relationship  between  y   and  z   is  static,  with  the  dynamics  entering  only 

through  the  error  term.   On  the  other  hand,  rejection  of  the  common  factor 

restrictions  is  viewed  as  implying  a  dynamic  relationship  between  y  and  z, 

and  therefore  the  unrestricted  model  (2.20)  should  be  estimated.   A  different 

perspective  is  that  the  vector  5  is  well-defined  whether  or  not  the  common 

factor  restrictions  hold;  it  is  simply  the  case  that  5  ^  5„,  so  that  the  link 

between  the  static  and  dynamic  expectations  has  been  broken.   Without  a 

specific  context  it  is  unclear  whether  S    is  of  less  interest  simply  because  a 

certain  set  of  nonlinear  constraints  on  the  parameters  of  the  dynamic 

expectation  are  not  satisfied. 

An  important  consequence  of  the  preceding  discussion  is  that  the 

conditions  underlying  the  consistency  and,  more  generally,  the  validity  of 

the  usual  test  statistics  in  the  the  static  (or  DL)  model  with  AR(1)  errors 

are  not  innocuous.   If  interest  lies  in  the  static  relationship  E(y  |z  ),  the 

DL  relationship  E(y  \z    ,z     ...),  or  some  other  expectation  that  is 

dynamically  incomplete,  without  also  imposing  assumptions  on  the  fully 

dynamic  conditional  expectation  E(y  \z    ,<f>         )  ,  or  on  the  exogeneity 

properties  of  z  ,  then  it  is  possible  to  compute  serial  correlation  robust 

standard  errors  of  the  OLS  estimator.   The  no  serial  correlation  assumption 

(2.14)  can  be  replaced  by  an  assumption  ensuring  that  the  dependence  in 

{x'u  )  dies  off  sufficiently  fast  for  B  to  be  consistently  estimated.   For 
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robustness  reasons  estimating  E(y  |z  )  via  a  static  regression  and  computing 
corrected  standard  errors  is  often  preferred  to  performing  an  AR(1) 
correction.   Testing  the  common  factor  restrictions  is  reviewed  in  section  4. 

A  heteroskedasticity/serial  correlation-robust  estimator  of  B  has  been 
recently  proposed  by  Newey  and  West  (1987)  and  Gallant  and  White  (1988);  both 
papers  modify  an  estimator  due  to  White  and  Domowitz  (1984).   The  estimator 
is  given  by 

^  A.  \J  A  A 

B  -  Qq   +  I   v(j,G)[n  +0'], 


(2.22) 
where 


A  -       X       A    A 

Q.  =  (T-K)     y   s's   . 
J  ■  1  t  t-j 


s   =  X  u 
t    t  t 


and 

(2.23)  <P(J.G)  -  1  -  j/(G+l),    j-1 G 

=  0,  j=G+l,G+2, . . . 

are  weights  that  have  been  used  in  the  literature  on  spectral  density 
estimation,  and  G  is  a  nonnegative  integer.   As  Newey  and  West  (1987)  show, 
the  weighting  in  (2.22)  ensures  that  B  is  positive  semi-definite.   The 
degrees  of  freedom  adjustment  factor  (T-K)    has  been  used  in  the  definition 
of  n.  because  there  is  some  evidence  that  it  reduces  finite  sample  bias. 

A 

Given  the  estimator  B,  the  heteroskedasticity/serial  correlation-robust 
covariance  matrix  estimator  of  0   is  given  by 

(2.24)  V/T  =  (X'X/T)"''"B(X'X/T)'Vt  =  (X' X) ' ''"(TB)  (X' X) '  ■"" . 

A 

The  asymptotic  standard  error  of  ;3.  is  obtained  as  the  square  root  of  the  j th 
diagonal  element  of  this  matrix. 

Sometimes  it  is  useful  to  be  able  to  avoid  the  matrix  manipulations 
involved  in  computing  (2.24).   By  focusing  on  one  variance  (or  covariance)  at 
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a  time,  an  H/SC-robust  estimator  can  be  obtained  from  simple  OLS  regressions. 
Wooldridge  (1990b)  shows  that  the  following  procedure  is  valid  for  computing 
an  H/SC-robust  standard  error  for  fi . . 

PROCEDURE  2.1: 

(i)  Run  the  regression 
(2-25)  y,   on  x^^,  x^2.  •••.  ^k'   ^=^ ^ 

A  A  A  A 

and  obtain  "se(y9.)",  a,    and  the  residuals  {u  :  t-1 T)  .   Here  "se(/3.)" 

A  A 

denotes  the  usual  (generally  incorrect)  standard  error  reported  for  fi.,    and  a 
is  the  standard  error  of  regression  (2.25).  ■    • 

(ii)  Run  the  regression 

(2.26)  X.   on  Xt,...,x   .,,x x„,   t=l,...,T 

tj       tl'      t,j-l'  t,j+l'    '  tK' 

A 

and  save  the  residuals,  say  (r  . : t=l , . . . ,T) . 

'-J 

(iii)  Define  f     =  r  .u  and  let 
t    tj  t 

A       A        G         A      '■■  '  ■  -  '"     ■■;.'' 

(2.27)  c.    ^    {to     +   2   I   <p(s,G)w  ) 

-'  s=l       ^ 

where 

T 


cj    ^  (T-K)''''    fee 

s    ^    ^     ^   ,^t^t-s' 


s  =  0, . . . ,G, 


1/4 


t=s+l 
!p(s,G)  is  given  by  (2.23),  and  G  is,  say,  the  integer  part  of  T 
Alternatively,  compute  c.  as 

(2.28)  c.    -  [T/(T-K)]rJ/(l  -  a^  -  a^  -  ...  -  a^)^ 

A 

where  a.,  i=l,...,G,  are  the  OLS  coefficients  from  the  autoregression 

(2.29)  e,      on  C,.,,...,J,.,. 

"2 
and  T      is  the  square  of  the  usual  standard  error  of  regression  (2.29). 

(iv)  The  H/SC-robust  standard  error  of  /3.  is 


14 


(2.30)         seifi.)    - 


I  ^    ■ 
t-1   -^ 


-1 


A   1/2        "  ^   2      "      1/2 


Equation  (2.30)  offers  a  simple  adjustment  to  the  usual  OLS  standard 
error  that  is  robust  to  heteroskedasticity  and  serial  correlation,  which 
simply  requires  the  additional  OLS  regressions  (2.25)  and  (2.29). 

A 

Note  that  c.  given  by  (2.27)  is  simply  (2.22)  applied  to  the  scalar 

A  A       A 

sequence  (^   -  r  .u  :  t-l,...,T);  it  is  a  consistent  estimator  of  the 

spectral  density  of  (^  )  at  frequency  zero.   The  estimator  (2.28)  is  Berk's 

1/4 
(1974)  autoregressive  spectral  density  estimator.   Berk  requires  G  -  o(T    ) 

to  establish  consistency.   If  the  model  is  dynamically  complete,  so  that 

there  is  no  serial  correlation  present,  then  a  heteroskedasticity-robust 

-1  "^  ^2 
standard  error  is  obtained  from  Procedure  2 . 1  by  setting  c.  =  (T-K)    ^  ^  . 

^  t=l 

The  H/SC-consistent  standard  errors  allow  construction  of  t-statistics 

for  testing  individual  hypotheses  about  P      .  .  .  ,P    .      Note  that  the  choice  of  G 

L       K 

can  be  different  for  each  B .  \    it  is  the  serial  correlation  properties  of 

A 

{r  .u  )  that  matter  for  the  standard  error  of  B..      Because  the  standard  error 
of  any  linear  combination  of  P   can  be  obtained  via  an  OLS  regression  on 
transformed  variables,  robust  standard  errors  of  linear  combinations  are 
easily  computed  using  Procedure  2.1.   For  example,  a  robust  standard  error 
for  the  long  run  propensity  in  a  distributed  lag  model  can  easily  be 
computed.   A  robust  estimator  for  the  covariance  between  any  two 

A  A  A  A 

coefficients,  say  ^.  and  ;3 .  ,  is  easily  obtained  from  V(^.),  V(^.),  and,  say, 

A  A 

V(^.  +  fi .) .      Simply  use  the  asymptotic  analog  of  the  relationship 

A       A  A  A  A  A 

CV(^.,^^)  =  [V(^.  +  p^)    -    V(^.)  -  V(p.)]/2. 
For  robust  Wald  tests  of  more  than  one  restriction,  a  quadratic  form 
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needs  to  be  constructed.   For  the  null  hypothesis 

(2.31)  Hq:  R^  -  r, 

where  R  is  a  QxK  matrix,  Q  <  K,  rank(R)  -  Q,  and  r  is  a  Qxl  vector,  the  Wald 
statistic  is  given  by 

A  A  -|  A 

(2.32)  W  ^   /T(R/9  -  r)'  [RVR'  ]   /T(R;9  -  r) 

A  A  -       A 

=  (R^  -  r)'  [R(V/T)R'  ]   (R/S  -  r)  , 

A  -  A  -  A 

where  V  is  generally  given  by  (X'X/T)'  B(X'X/T)'  ,  and  B  is  chosen  to  be 
heteroskedasticity  or  H/SC-consistent ,  as  needed.   Note  that  the  correct 

A 

formula  for  the  Wald  statistic  is  obtained  by  naively  treating  fi   is  if  it 
were  distributed  exactly  as  N(y9,V/T).   Under  H 

(2.33)  W     -     Xq. 

Under  homoskedasticity  and  no  serial  correlation  V  can  be  taken  to  be 

A  /^  -  A  rt  -^ 

a    (X'X/T)   ,  where  a      is  given  by  (2.12).   Plugging  this  choice  of  V  into 
(2.30)  and  rearranging  yields 

A  -  -       A  ^  O 

(2.34)  f/  =  (R/3  -  r)'  [R(X'X)   R'  ]   (R/3  -  r)/a 

where  F   is  the  standard  F-statistic  for  testing  (2.1).   Under  (2.10)  and 
(2.11),  F   can  be  used  as  distributed  approximately  as  ?      (this  is  because 
^n  N  If  "^  ^n^^  as  N  -+  a>)  .   In  general,  the  usual  F-statistic  does  not  have  a 
known  limiting  distribution  in  the  presence  of  conditional  heteroskedasticity 

A 

or  serial  correlation.   In  these  cases  the  robust  forms  of  V  should  be  used. 


3.  Regression-Based  Specification  Tests 

Regression-based  diagnostics,  which  are  frequently  interpreted  as 
Lagrange  multiplier  (LM)  tests,  are  quite  popular  in  time  series 
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econometrics.   Pagan  and  Hall  (1983)  refer  to  such  procedures  as  "residual 
analysis"  because  the  statistics  are  motivated  by  examining  the  residuals 
from  (in  this  case)  an  OLS  regression.   For  example,  consider  testing  the 
hypothesis  H  :  7  -^  0  in  the  model 

E(y^|x^,V^)  -  x^0   +   ip^-y, 
where  V   is  a  IxQ  subvector  of  (l,z  ,y   i.z   ,,...)  with  lag  lengths  not 
depending  on  t .   As  in  section  2,  x   generally  denotes  a  IxK  subvector  from 
(l,z  ,y    z   ..,...)  with  lag  lengths  not  depending  on  t.   Under  H., 

(3.1)  ^(^t"t^  "  °" 

where  u  =  y   -  x  /9  are  the  true  errors  under  the  null.   The  obvious  way  to 

operationalize  (3.1)  is  to  obtain  the  OLS  residuals  u   from  the  regression 

y^   on   x^,   t=l T 

and  check  to  see  whether  the  sample  covariance  between  V  ^ri*^  ^  . 


i 

(3.2)  t"^  y  V'u  , 

■^,  t  t 


(3.3)      T-^/2  I   ^^u^  =  T-^/2  I  r^u^    -  T"^  I   V'^x^/T(^  -  P)  , 


t=l 
is  significantly  different  from  zero.   This  is,  in  effect,  what  the  Wald 

statistic  for  testing  H  :  7=0  does,  but  it  is  possible  to  derive  a 

statistic  directly  from  (3.2).   What  is  needed  is  the  asymptotic  variance  of 

T    A  T  T 

t=l  t=l  t=l 

A 

where  P   is  the  OLS  estimator  of  p.      Depending  on  the  assumptions  imposed 
under  H   there  are  various  ways  that  (3.3)  can  be  used  to  derive  a  test 
statistic.   Before  proceeding  further,  it  is  useful  to  allow  for  a  broader 
class  of  specification  tests,  as  this  is  obtained  without  much  additional 
work.   Assume  generally  that  the  null  hypothesis  can  be  expressed  as 

(3-4)  Hq:  E(y^|x^,V.^)  =  x^^. 
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where  ih      is  a  IxM  vector  of  elements  from  (z  ,y   , ,2   ,,...)•   To  allow  for 
^t  t  -'t-l   t-1 

tests  of  neglected  nonlinearity  and  endogeneity,  let  A(V'  ,??)  be  a  IxQ  vector 
of  "misspecif ication  indicators",  which  is  allowed  to  depend  on  a  vector  of 
unknown  nuisance  parameters  r; .   The  choice  of  A(V)  ,17)  depends  on  the 
alternatives  against  which  one  would  like  to  have  power,  and  can  contain 
linear  and  nonlinear  functions  of  V  •   Several  choices  for  A  will  be 
discussed  in  the  examples.   The  parameters  t]    are  called  nuisance  parameters 
because  they  need  not  have  an  interpretation  as  "true"  parameters  under  H 
although  rj    is  frequently  equal  to  yS. 

The  null  hypothesis  can  be  stated  equivalently  as 

(3.5)  E(u^|x^,V^)  =  0  ;. 


where 


u   s  y   -  X  fl. 
t    ■'t     t^ 


If  rj    is  an  estimator  such  that  /T(r7  -    n)    =  0    (1)  then  a  test  of  (3.4)  is 

P 

based  on 

«     i    A    A 

T    y  A'u  , 
'-'      t  t 

t=l 

A  A  A 

where  the  u^  are  the  OLS  residuals  and  A  =   >^{i>    ,n)    are  the  estimated 
t  t     ^t'  ' 

misspecif ication  indicators.   Under  (3.5)  and  standard  regularity  conditions, 
a  simple  mean  value  expansion  shows  that 

T  A  T 

(3.6)  T'^^^   IX'u     =  T"^/^  y  A'u   +o(l), 

tt  ■'-'ttD 

t=l  ^  ^  t=l  P 

where  A   s  \{il)      ri)    (e.g.  Wooldridge  (1990a)).   This  shows  that,  under  H    the 
asymptotic  distribution  of  rj   does  not  affect  the  asymptotic  distribution  of 

(3.7)  T-^/2  ^  A'u 

t=l 

as  long  as  r?  is  /T-consistent  for  r; .   A  more  convenient  form  of  (3.7)  is 


(3.8) 
where 


1/2 


y  (A   -  X  C)'u  -^     y  r'u 


t-1 


t-1 


•  T     V-"-  '^ 

y  x'x       y  X'  A 

t-1    -'  t-1 
is  the  KxQ  matrix  of  regression  coefficients  from  the  regression 


A   on  X  ,   t-1 , . . . ,T, 

t       t 


and 


r  ,  t-l,...,T,  are  the  IxQ  residual  vectors  from  this  regression.   Let  £ 

t  ^  ^  t 


-1. 


E  u  r   and  f   =ur   =u(A   -xC),  where  C  «  plim  C  -  [E(x'x  )]   E(x' A  ), 
t  t      ^t     t  t     t^  t     t  ^         ^    t  t    '  t  t 

and  u  E  y  -   X  0.      Note  that  ^   is  simply  the  population  analog  of  ^  :   the 
estimated  quantities  0,    rj ,  and  C  have  been  replaced  by  their  plims.   The 

A 

process  (^  :  t=l,2,...,T)  has  the  useful  property  that 


(3.9) 


,-1/2 


t-1^ 


1/2 


I  r  2  0 

til  '^ 


under  H    To  see  this,  note  that 


■1/2 


til  '^ 


1/2 


y  (A   -  X  C)'u 

til  ^    ^    ^ 


■1/2 


=  T 


y  (A   -  X  C)'u  +  o  (1)    by   (3.6) 

"^.t    t^t    p^^     ^      ^         ' 

T 
-1/2 

t=l 


t=l 
T 


i   (A^  -  x^O'u^  -  T'-^  i   (A^  -  Xj.C)'x^/T(^  -  p)    +   0(1) 


.-1/2 


t=l 

T 

I    (A   -  X  C)'u^  +  o  (l)-0  (1)  +  o  (1) 
^^^   t     t     t     p     p       p 


since  E[(A   -  x  C)'x  )]  =  0  and  /T(fl  -    B)    =  0    (I) .      This  establishes  (3.9), 
t    t    t  p 

-1/2  '^  " 
and  shows  that  asjonptotic  distribution  of  T     ^  ^   under  H^  is  obtained 

"=^  ^  T 

- 1/2 
once  the  easier  problem  of  finding  the  asymptotic  distribution  of  T     ^  ^ 

t=l  ^ 
has  been  solved.   Equation  (3.8)  also  makes  it  clear  that  the  test  based  on 
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(3.5)  is  really  a  test  of 

H„:  E[(A   -  X  C)'u  1  -  0. 
0    ^  ^  t    t  ^   t^ 

A 

Because  the  u  are  orthognal  to  x  by  construction,  the  test  checks  whether 
the  part  of  A  which  is  uncorrelated  with  x   is  correlated  with  u 

A  test  of  (3.5)  can  be  constructed  for  most  stationary  vector  processes 
{^^:  t=l,2,...).   If 


H  s  lim  V 

T-KX3 


•1/2 


t=l 


^/t  ="(^t^t)  ^  .^  '"(^t^t+i)  ^  E^^'t+i^t>' 


J-1 


■t+J 


is  nonsingular,  and  if  the  central  limit  holds  for  {£  ),  then 


under  H 


0' 


(3.10) 


t=l 


t=l 


'  d   2 


If  i  is  a  consistent  estimator  of  :i,  e.g. 
T  .  .     G  ,   T 


t=l       j=l  t=j+l      -*       -^ 


a/4 


with  G  =  o(T    )  and  (p(j ,G)  given  by  (2.23),  then  a  computable  statistic  is 


(3.11) 


t=l    • 


::-i 


t  =  l    • 


The  statistic  (3.11)  has  an  asymptotic  Xp,   distribution  under  H^ ,  and  allows 

2 

E(u^|x_,i/'  )  and  E(u   .u  Ix  .  .xh      .  ,x    ,\1)   )    to  be  of  fairly  arbitrary  form, 
t'  t^t         t+j  t'  t+j^t+j   t  ^t  ^  ^ 

Thus,  (3.11)  is  one  possible  approach  to  computing  a  test  statistic  that  is 
robust  to  heteroskedasticity  and  serial  correlation. 

Frequently  it  is  useful  to  have  available  a  statistic  that  can  be 
computed  via  OLS  regressions.   It  turns  out  that  such  a  statistic  can  be 
derived  which  is  still  heteroskedasticity  and  serial  correlation  robust.   The 
idea  is  simple:   if  ^   were  a  VAR(G)  process  (which  is  necessarily  stable 
because  of  the  ergodicity  assumption),  i.e. 
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(3.12) 


^t   ^t-l  1   ^t-2  2         ^t-G  G    C 


where  {j/  )  is  a  sequence  of  IxQ  uncorrelated  errors,  then  E(^  )  -  0  if  and 


only  if  E(j/  )  -  0.   A  test  of  H^ 


can  be  based  on 


(3.13) 


1  t      ^^  1 1 
t=i  -'  ^   t-i 

2  .. 


-1 


t-1  ^ 


which  has  an  asymptotic  Xp.  distribution  under  H. .   To  operationalize  (3.13) 

A 

1/      can  be  estimated  as  the  residuals  i/      from  the  vector  autoreeression 

t  t  ^ 

A  A  A 

^t  °"  ^t-1 ^t-G- 

A 

To   justify   replacing   v      with   j/      in    (3.13),    note    that 


T    ^  T 

T-1/2    I   u      -    T-^/2 
t=l    ^ 


'       .-1/2   ^ 


I   u      -      I      t'"/^    X      (^       .R.     -    e       .R.) 

t=i  ^      j^o  t-1    ^-J  J      ^-J  J 


where  R„  =  R.  =  I^  and  R.  are  the  QxQ  coefficient  matrices  from  the  VAR.   For 
0     0     Q      J 


each  j  =  0 , . . . , G , 
T 


(3.14) 


,-1/2 


1/2 


y   (C   .R.  -  C   .R.)  ■=  T     '      y   (C   .R.  -  ^   .R.) 

t=l    ^-J  J     ^-J  J  t=l     '^"J  J     ^'J  J 


-1    ,  rt    X  A  A         A 

+  T^^   y  {(.       .R.  -  ^   .R.) 


=  t'-"-/^  y   e    -(R-  -  R.) 
t=i  ^-J    J      J 


+  t'-*-/^  y   (^    .  -  ^    .)R.. 
t=i    "^'J    ^-y  J 


But  under  H^ :  E(£  )  =  0, 
0    ^t 


and,  by  the  CLT, 


•1/2 


I.  «^-J 


•  ffj)  -  °p<i> 
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T-^/^I^,.j-0(l). 


Unde 


A  A 

r  standard  regularity  conditions,  R.  ^  R. ,  and  so  R.  -  0  (1)  under  H_ . 

J     J  J     P  0 


Thus,  both  terms  on  the  right  hand  side  of  (3.14)  are  o  (1)  under  H_ ,  and  it 


follows  that 


T-V2  I  :      .  ,-1/2  l^     _  ^  (,) 
t=l  ^        til  ^    P 


under  H    A  valid  test  statistic  is 


^,  t     ^,  t  t 
t=l  ''  *-   t=l 


t-l ' 


2 
which  has  an  asymptotic  Xrs   distribution  under  H^ .   This  statistic  is  easily 

2 
seen  to  be  TR  =  T  -  SSR  from  the  regression 

u  ° 

A 

1   on  J/  ,   t=l T. 

To  summarize,  the  heteroskedasticity/serial  correlation-robust  procedure  is 


PROCEDURE  3.1: 

A 

(i)  Obtain  u  as  the  residuals  from  the  OLS  regression 
^t   °"  ^t= 

A  A 

Compute  the  IxQ  vector  indicator  A  =   \{tI>    ,ri). 

(ii)  Obtain  r   as  the  IxQ  vectors  of  residuals  from  the  regression 

A 

A    on  X  . 

t        t 

(iii)  Define  f      to  be  the  IxQ  vector  6   =  u  -r  .   Save  the  IxQ 
^t  ^        ^t    t   t  ^ 


residuals  i/   from  the  VAR(G)  regression 


f   on  £   .,,...,£ 
^t      ^t-1'     '  ^t-G 

2 
(iv)  Use  TR  =  T  -  SSR  from  the  regression 
u  ° 

1   on  1/ 
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2 

as  asymptotically  Xp,   under  H. .   In  practice,  one  uses  as  T  the  actual  number 

of  observations  used  in  this  final  regression.   * 

The  only  step  which  is  not  automatic  in  Procedure  3.1,  besides  choosing 
the  misspecif ication  indicator  A   (which  will  be  discussed  shortly) ,  is  the 
choice  of  G  in  step  (iii).   This  is  conceptually  the  same  problem  as  choosing 
G  in  computing  the  Newey-West  or  Gallant-White  covariance  matrix  estimator, 
and  differs  depending  on  the  problem.   The  choice  of  G  might  depend  on  the 
frequency  of  the  data  and  can  differ  across  misspecif ication  indicators.   The 
key  is  to  choose  G  so  that  |i/  )  is  approximately  uncorrelated.   But  if  G  is 
chosen  too  large  relative  to  T,  the  chi-square  distribution  may  not  be  a  good 
approximation  to  the  distribution  of  the  test  statistic. 

It  must  be  emphasized  that  Procedure  3.1  is  not  the  same  as  assuming 
that  the  errors  from  the  original  model  {u  )  follow  an  AR(G)  process  and  then 
computing  fi   and  test  statistics  based  on  a  Cochrane-Orcutt  type  procedure. 
Such  a  procedure  imposes  strict  exogeneity  of  x   and  common  factor 
restrictions  on  the  dynamic  regression  E(y  Ix  ,<i   ,),  which  are  not 
necessarily  intended  under  H    The  VAR(G)  in  step  (iii)  is  used  to  obtain 
estimates  of  i/  As  long  as  G  is  selected  appropriately,  u     will  be 

approximately  uncorrelated  and  step  (iv)  produces  a  valid  test  statistic.   If 

G  is  too  small  then  T   Y  i/'  u   is  an  inconsistent  estimate  of  the  variance  of 

■1  t  t 
~  t=l 

- 1/2 
T     ^  '^r'  ^^^   ^^^   statistic  still  has  a  well-defined  limiting  distribution 

t=l 
under  H. ;  in  contrast,  a  Cochrane-Orcutt  type  procedure  could  inappropriately 

reject  H  with  probability  going  to  one. 

There  are  two  cases  where  Procedure  3.1  can  be  simplified.   The  first  is 

when  the  null  hypothesis  imposes  homoskedasticity  and  no  serial  correlation, 
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applied  now  to  u   and  (x  ,ip    )  .       In  other  words,  (2.10)  and  (2.11)  are 

2  2 

replaced  by  E(u^^  u^  jx^^  V^.^.  ,x^ ,  V-^)  -  0,  j  >  1  and  E(u^|x^,V'^)  -  a    , 

respectively.   In  this  case,  using  the  fact  that  A   is  a  function  of  4> 

E(C;^^)  =E[E(C;iJx^,^^)] 

=  E[E{uJ(A^-x^C)'  (A^-x^.C)  |Xj.,V^)  ] 

»  E[E(u^|x^,V^)(A^-x^C)'  (A^-x^C)] 

-  £7^E[(A^-x^C)'  (A^-x^C)] 

^  a^E(r'r  ). 
^  t  t^ 

Also ,  ',.•'■,. 

E(£'  .e    )  =  E[E(6'  .f    Ix  .  ,-ip      .,x  ,V'  )] 

^^t+j^t    ^  ^t+j^t'  t+j  t+j  t^t  ■'        : 

=   E[E{u      .u  r'     .r    Ix      .  ,ip      .,x    ,rl>    }] 
^         t+j    t    t+j    t '    t+j    ^t+j      t  ^t    ■' 

=   E[E(u      .ulx      .  ,Tp      .  ,x    ,ip    }r'     .r] 
^         t+j    t'    t+j'^t+j'    t^t      t+j    t^ 

=   0. 

Under  homoskedasticity  and  no  serial  correlation,  E  has  the  very  simple  form 

T 
H  =  E(C;^.)  =  a^T"^  I   E(r;r  ), 

t=l 

and  is  consistently  estimated  by 

Art       -1     1    A    A 

^  T"^  y  r'r 
t=l  ^  ^ 

T 

where  a  =  T   !'-'<-•   ^^  this  expression  is  used  for  5  in  (3.11),  the 

t=l 

2 
resulting  statistic  has  a  limiting  Xp,  distribution  under  H   and  the 

additional  assumptions  of  homoskedasticity  and  no  serial  correlation.   It 

turns  out  that  step  (ii)  of  Procedure  3.1  is  no  longer  necessary;  instead, 

2 
the  statistic  is  computed  as  TR   from  the  regression 

A  A 

(3.15)  u   on  X  ,  A  ; 

t       t'   t' 

A 

if  X   contains  a  constant  then  u  has  zero  sample  average  and  then  the  usual 
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2 

r-squared  can  be  used  as  R  .   This  form  of  the  LM  statistic  is  well-known  and 

has  been  discussed  by  many  authors;  for  examples  and  many  references,  see 
Pagan  and  Hall  (1983)  and  Engle  (1984).   The  computational  simplicity  of 
regression  (3.15)  is  somewhat  offset  by  its  nonrobustness  to 

heteroskedasticity  and/or  serial  correlation.   Because  it  is  by  far  the  most 
popular  form  of  the  LM  statistic  used  its  properties  and  limitations  should 
be  understood. 

The  homoskedasticity  assumption  can  never  be  guaranteed  to  hold  under 
the  null  as  it  concerns  V(y  |x  ,il>    )  ,    not  E(y  |x  ,  \6  )  .   On  the  other  hand,  in 
section  2  it  was  shown  that  the  absense  of  serial  correlation  is  a 
consequence  of  E(y  |x  )  being  dynamically  correctly  specified.   If  the  null 
hypothesis  is 

(3.16)  H„:  E(y  Ix  )  =  E(y  \x    ,4>      ,  ) 

0    ^■'t'  t      -'t'  t   t-1 

then,  for  any  subvector  V"  of  elements  from  (x    ,4>      ..  )  , 

(3.17)  E(u   .u  Ix  .  ,iP      .  ,x  ,V'  )  -  0 

^  t+j  t'  t+j^t+j   t   t 

by  the  law  of  iterated  expectations.   Consequently,  if  the  null  hypothesis 
imposes  (3.15)  either  explicitly  or  implicitly,  then  there  is  no  need  to  make 
the  tests  robust  to  serial  correlation.   Two  examples  of  this  are  testing  for 
Granger  causality  and  testing  for  serial  correlation;  both  of  these  take  the 
null  model  to  be  dynamically  complete.   Obtaining  tests  for  dynamic 
misspecification  that  are  robust  to  heteroskedasticity  is  accomplished  by 
simplifying  Procedure  3.1.   Because  (^  )  is  serially  uncorrelated  under 
(3.16),  H  is  consistently  estimated  by 

T    A    A 

(3.18)  T-^ll'l 

t=l 

A 

whether  or  not  E(u  |x  ,tI>   )  is  constant.   Using  (3.18)  as  E  in  (3.11)  is  the 
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same  as  skipping  step  (iii)  in  Procedure  3.1  and  going  directly  to  step  (iv) 
with  ^  in  place  of  i/  .  Thus,  the  heteroskedasticity-robust  LM  statistic  is 
obtained  by  performing  steps  (i) ,  (ii),  and 

2 
(iii' )  Use  TR  -  T  -  SSR  from  the  regression 

1   on   ?^ 

2 
as  asymptotically  Xp,   under  (3.5)  and  (3.17).   Again,  T  here  corresponds  to 

the  actual  number  of  observations  used  in  this  last  regression. 

Procedure  (i),  (ii),  and  (iii')  is  robust  in  the  presence  of 

heteroskedascity ,  and  loses  nothing  asymptotically  in  the  event 

that  heteroskedasticity  is  not  present  (Wooldridge  (1990a)). 

EXAMPLE  3.1  (Omitted  Variables  in  a  Static  Regression  Model):   Consider  the 
model 

(3.19)         y^  =  a  +  z^6  +  ^^7  +  u^ ,   E(u^  |  z^. ,  V-^)  =  0. 

where  the  IxQ  vector  V   is,  like  z  ,  a  set  of  contemporaneous  variables. 
Interest  lies  in  testing  H  :  7  =  0  or  E(y  \z    ,\l>    )    =   E(y  |z  ).   Nothing 
guarantees  that  u  will  be  homoskedastic  or  serially  uncorrelated  under  H_ ; 
the  testable  implication  of  H   is  E(xp' u   )  =  0.   If  interest  lies  in  testing 
exclusion  of  V   in  a  serial  correlation  robust  manner  then  Procedure  3.1  can 

A 

be  applied  by  setting  x  =  (l,z  )  and  \(xl)    ,r?)  =  V  •   The  residuals  u  are 

obtained  under  H_  from  the  regression 

y   on   1 ,  z  , 
-'t       '   t' 

A 

and  the  r   are  obtained  from  the  regression  th      on  1 ,  z  .   An  H/SC  Chow  test 
t  ^         t         t 

is  obtained  by  setting  V  -  (d  ,d  z  ),  where  d  is  a  dummy  variable  equal  to 
unity  after  the  hypothesized  break  point.   The  same  procedure  works  if  x   and 


tp     are  replaced  by  more  general  regressors. 
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EXAMPLE  3.2  (Testing  Functional  Form  in  a  Static  Regression  Model):   Suppose 
that  H_  is  specified  generally  as 

H„:  E(y  |z)-q  +  Z(5"X/9. 
0    ^-^t'  t^        t     t^ 

A  test  for  nonlinearities  can  be  obtained,  for  example,  by  choosing  \(i{>    ,r))    ^ 

((x  ^)^,(x  /9)^),  as  in  Ramsey's  (1969)  RESET.   Then  Q  -^  2 ,  n   ^  P ,    and  A   = 

A       A  A 

2    3 

(y  »y  )»  where  y   are  the  fitted  values  from  the  regression  y  on  1 ,  z  ;  the 

A 

u   are  obtained  from  the  same  regression.   Note  that  V   =  x   =  (1,2  ),  and 
nothing  guarantees  that  u   is  homoskedastic  or  serially  uncorrelated 

A 

under  H_ .  Other  functions  of  z  can  be  used  in  A  such  as  the  fitted  values 
from  a  nonlinear  regression.  Also,  the  same  procedure  applies  to  the  general 
model 

E(y,|x^)  -  x^^. 

where  x   can  contain  lagged  values  of  y   and/or  z  .   " 

EXAMPLE  3.3  (Testing  for  Additional  Lags  in  a  Distributed  Lag  Model): 
Consider  the  finite  DL  model 

E(y^|z^,z^_^,...)  =  a   +  z^S^   +  ...  +  z^_^8^, 

where  z   is  a  scalar  for  simplicity.   For  P  <  M,  suppose  the  hypothesis  of 
interest  is 

A 

The  number  of  restrictions  isQsM-P,  x  =(l,z,z   .,,... ,z   „),A   = 
V"^  -  (z     ^ z    )  ,  and  u   is  obtained  from  the  restricted  regression 

y   on   l,z,...,z 
■'t       '   t'     '   t-P 

Again,  the  heteroskedasticity  and  serial  correlation  robust  test  is 

appropriate  here,  as  nothing  ensures  that  the  errors  u   are  serially 
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uncorrelated  under  H  .   ■ 

EXAMPLE  3.4  (Testing  for  Serial  Correlation  in  a  General  Dynamic  Model): 
Suppose  that  under  H_, 

y^  =  x^^  -.  u^,    E(u^|x^.^^_^)  -  0. 

As  mentioned  above,  a  heteroskedasticity-robust  LM  statistic  for  AR(Q)  serial 


the 


correlation  is  obtained  via  steps  (i),  (ii),  and  (iii')  with  \      = 

A  A  A 

(u   ,  ,  .  .  .  ,u   ^)  and  u   obtained  from  reeressinp;  y   on  x   (note  carefully 
t-1     '  t-Q^       t  ^      ^  -'t      t  -^ 

A 

subscripts  on  the  lagged  residuals  comprising  A  ) .   * 

EXAMPLE  3.5  (Testing  for  Granger  Causality):   The  null  hypothesis  in  this 
case  is 

^^ytiyt-r^-ryt-2'^-2'---^  =  E(ytiyt-ryt-2'---)' 

and  to  operationalize  this  it  is  assumed  further  that 

E(y^|y^.;L,y^.2,...)  =  E(y^|y^_^,  .  .  .  ,y^_p)  =  a   +  5^y^_^  +  ...  +  6^y^_^. 
The  lag  length  P  needs  to  be  selected,  usually  by  choosing  a  value  and  then 

A 

testing  for  additional  lags  of  y.   To  test  for  Granger  Causality,  let  u  be 
the  residuals  from  the  regression 

yonl,yT,...,y„. 

A 

Then  set  A   =  (z   ^  z    )    for  some  Q  >  1,  and  use  this  either  in 

t     t-1     '  t-Q^  ^ 

regression  (3.15)  or  in  the  heteroskedasticity-robust  procedure  (i) ,  (ii), 
and  (iii').   Note  that  the  choice  of  Q  is  entirely  up  to  the  researcher. 
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4.  Some  Considerations  When  Applying  Specification  Tests 

The  results  of  sections  2  and  3  provide  simple  procedures  for  performing 
inference  in  linear  time  series  models  with  ergodic  data.   Because  time 
series  analyses  differ  in  their  goals,  the  manner  in  which  the  various  tests 
in  section  3  are  applied  can  differ  across  applications.   Choosing  a  sensible 
strategy  first  requires  deciding  which  relationship(s)  between  y  and  z  is  of 
interest.   If  the  goal  is  to  estimate  a  model  for  E(y  |z  ,y   -i.^^  i.---)i 

^^^t'^t-l'^t-1 •*  '  °^  ^^^t'^t-l'^t-2 ■*  '  ^^®"  ^'^'^   tests  are  either 

implicitly  or  explicitly  tests  of  dynamic  specification.   Such  is  the  case 

for  tests  for  serial  correlation  or  Granger  Causality,  as  well  as  the  tests 

for  common  factor  restrictions  discussed  below.   Computation  of  the 

specification  tests  is  simplified  in  this  case  because  they  need  not  be  made 

serial  correlation  robust.   It  makes  sense  to  compute  both  the  standard  form 

of  the  tests  (either  the  F-test  or  LM  test  (3.15))  and  the 

heteroskedasticity-robust  LM  test  developed  in  section  3.   Dynamic  forms  of 

heteroskedasticity  are  often  found  in  regressions  with  financial  data  series, 

so  the  heteroskedasticity-robust  forms  might  be  particularly  useful  in 

testing  asset  pricing  models. 

There  are  certain  problems  for  which  the  static  expectation  E(y  |z  ), 

the  distributed  lag  expectation  E(y  |z  ,z     ...),  or  some  other  dynamically 

incomplete  expectation  is  of  interest.   In  this  case  one  must  distinguish 

among  several  null  hypotheses.   Godfrey  (1987)  has  recently  recommended  a 

sequential  specification  testing  strategy  which  attempts  to  test  hypotheses 

in  a  logically  consistent  manner.   The  strategy  suggested  here  is  related  to 

Godfrey's  approach  but  differs  in  certain  respects,  including  the  form  of  the 

specification  tests  used. 
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First  consider  the  case  where  the  null  hypothesis  specifies  a  static 
linear  model  relating  y   to  z  .   The  first  hypothesis  of  interest  is  the 
linearity  of  the  conditional  expectation  E(y  |z  ).   More  formally,  the 
hypothesis  is 

(4.1)  E(y  |z  )  -  a  +  z  5    (linearity). 

A  test  of  (4.1)  can  only  be  based  on  indicators  that  are  nonlinear  functions 

A 

of  z  ,  say  A(z  ,rj)    (e.g.  see  Example  3.2);  to  be  robust  to  heteroskedasticity 
and  serial  correlation  (neither  of  which  can  be  ruled  out  if  the  null  is 
(4.1)),  Procedure  3.1  should  be  used. 

A  second  hypothesis  that  is  frequently  of  interest  is  whether  an 
additional  set  of  contemporaneous  variables  can  be  excluded  from  the  linear 
model.   If  V   is  a  IxQ  vector  of  contemporaneous  variables  (in  addition  to 
z  ) ,  then  the  null  hypothesis  is 

(4.2)  E(y  |z  ,xl>   )    =  a  +   z   6        (exclusion  restrictions). 
Assuming  that  the  alternative  to  (4.2)  is  the  linear  model 

(4.3)  E(y^|z^,V^)  =  a  +   z^S   +  ^^-y , 

A. 

(4.2)  is  tested  using  the  H/SC-robust  Procedure  3.1  with  x  =  (l,z  )  and  A  = 

Tp    \      Applying  Procedure  3.1  to  both  hypotheses  (4.1)  and  (4.2)  ensures  that 

only  the  relevant  nulls  are  assumed  under  H_  ;  E(y  \z    ,d>      ,)  and  V(y  Iz  .lA  ) 

0    -^t'  t   t-1        -^t'  t   t 

are  unrestricted  up  to  regularity  conditions. 

It  is  important  to  stress  that  the  hypotheses  (4.1)  and  (4.2)  are  very 
different  in  that  they  restrict  different  conditional  expectations:   (4.1) 
restricts  E(y  |z  )  while  (4.2)  restricts  E(y  |z  ,i)    )  (and  hence  E(y  |z  )). 
It  is  quite  possible  that  (4.1)  holds  but  (4.2)  does  not  (e.g.  if  (y  ,z  ,-ip   ) 
are  jointly  normally  distributed  with  7  ?^  0  in  (4.3)).   Further,  if  (4.1) 
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holds  then  the  tests  for  linearity  of  E(y  |z  )  discussed  in  Example  3.2  have 
no  power  for  detecting  violations  of  (4.2)  (i.e.  the  asymptotic  power  is 
equal  to  the  asymptotic  size).   Generally,  RESET  has  power  against 
E(y  |z  ,-ip)-a  +   zS   +  Tp-y,    7  ^^  0 ,  only  if  E(V'  |z  )  is  nonlinear.   The  only 
way  to  really  test  for  omitted  variables  is  to  use  those  variables  (xp   )  in  an 
LM  or  F-test.   Although  these  comments  are  clearly  illustrated  when  models 
are  stated  in  terms  of  conditional  expectations,  there  has  been  some 
confusion  on  this  point  in  the  literature  (e.g.  Thursby  (1985)).   The 
confusion  arises  when  writing  the  model  in  error  form  and  not  accounting  for 
the  change  in  the  coefficient  on  z   when  the  conditioning  set  is  reduced  from 
(z  -V"  )  to  z  . 

Next,  one  might  want  to  test  whether  the  static  conditional  expectation 
is  equal  to  the  DL  expectation,  i.e. 

(4.4)  E(y  |z   z     ...)  -  E(y  |z  )    (no  distributed  lag  dynamics). 

Again  assuming  linearity  under  H^ ,  this  test  is  covered  by  Example  3.3  with  P 
-  0  and  M  =  Q  to  be  chosen  by  the  researcher.   Again,  Procedure  3.1  should  be 
used  because  (4.4)  implies  nothing  about  E(y  \z    ,4>        )  or  V(y  |z  ,z     ...). 
As  with  hypotheses  (4.1)  and  (4.2)  the  null  model  is  a  +  z  5 .   But  (4.4)  is  a 
hypothesis  about  a  different  expectation. 

In  some  cases  it  might  be  hypothesized  that  the  static  expectation  is  in 
fact  equal  to  the  dynamic  expectation: 

(4.5)  E(y  |z  ,1^    )  =  E(y  |z  )    (correct  dynamic  specification). 

Hypothesis  (4.5)  is  of  interest  sometimes  simply  for  the  reason  that  the 
presence  of  serial  correlation  invalidates  the  use  of  the  usual  OLS  test 
statistics.   The  most  popular  methods  of  testing  (4.5)  are  (i)  by  testing  for 
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serial  correlation  in  the  errors,  as  in  Example  3. A,  or  (ii)  by  forming  an 

alternative  model 

(4.6)  y^  -=  a  +  z^6  +  V^7  +  u^ , 

where  V  i^ow  contains  lagged  values  of  y   and/or  z  ,  and  testing  for 
exclusion  of  V  by  an  F  or  LM  test.   The  heteroskedasticity-robust  LM  test  is 
obtained  with  A   ^  ^   in  steps  (i),  (ii),  and  (iii'). 

Hypotheses  (4.1),  (4.2),  (4.4),  and  (4.5)  represent  restrictions  on  four 
different  conditional  expectations,  even  though  the  null  specifies  the  same 
model.   If  the  LM-type  Procedure  3.1  and  its  variants  are  used,  then  all 
tests  are  based  on  the  residuals  u   from  the  static  regression 

y^   on   1,  z^,   t=l T. 

A 

The  misspecif ication  indicator  A   determines  against  which  alternatives  the 
test  is  likely  to  have  power. 

The  analysis  for  a  null  finite  distributed  lag  model  is  analogous  to 
that  for  the  static  model.   The  null  model  is  of  the  form  q  +  z  5_  +  z   -i^-i  + 
...  +  z        6    .      The  analogs  of  (4.1),  (4.2),  (4.4),  and  (4.5)  are  almost 
immediate.   For  example,  the  null  of  correct  dynamic  specification  is 
expressed  as 

E(y^|z^,^^_^)  =E(y^|z^,z^_^,...). 

This  can  be  tested  by  including  lags  of  y   in  the  DL  model  and  computing  the 

A 

F-statistic  or  by  using  the  heteroskedasticity-robust  LM  procedure  with  A 

containing  lagged  y  . 

The  above  analysis  stresses  that  it  is  a  good  idea  to  compute  serial 

correlation-robust  standard  errors  when  testing  hypotheses  about  expectations 

other  than  E(y    ly.    ,d>      ^),  for  any  IxK  subvector  x  .   Nevertheless,  because  of 
-  t'  t   t-1         -^  t 
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its  simplicity  and  proven  usefulness,  a  more  popular  alternative  for  static 
or  DL  models  is  to  use  Cochrane-Orcutt  methods  to  estimate  an  AR(1)  process 
for  the  errors.   If  the  AR(1)  assumption  is  correct  then  this  leads  not  only 
to  consistent  estimates  of  the  coefficients  in  the  static  or  DL  model,  but 
also  to  more  efficient  estimates.   The  static  case  where  z   is  IxJ  is  given 
by  equations  (2.16)  and  (2.17). 

There  are  two  essentially  distinct  tests  that  one  can  perform  on  the 
static/AR(l)  model.   The  first  focuses  on  the  dynamic  regression  and  the 
common  factor  restrictions.   Recall  from  section  2  that  the  common  factor 
restrictions  (2.21)  impose  J  nonlinear  restrictions  on  the  parameters  of  the 
dynamic  expectation  E(y  \z    ,4>      ..  )  .   Because  (2.21)  is  necessary  for  (2.16) 
and  (2.17)  to  hold,  a  rejection  implies  that  the  usual  statistics  based  on 
quasi-differenced  regressions  are  invalid.   And,  of  course,  if  one  is 
interested  only  in  E(y  |z  ,<^  ^)  ,    then  the  nonlinear  constraints  imposed  on 
the  parameters  of  this  expectation  should  be  justified. 

One  way  to  test  these  restrictions  on  the  dynamic  regression  model  is  to 
estimate  the  unrestricted  vector  (q. , 7'  ,  p  ,  7' )'  by  the  OLS  regression 

y   on   l,z,y   ,,z   ,, 
•'t       '   t'  -'t-l'   t-1' 

and  to  form  the  Wald  test  for  the  J  nonlinear  restrictions,  as  in  Sargan 

(1964).   Because  the  null  hypothesis  is  correct  dynamic  specification,  there 

is  no  need  to  make  the  statistic  robust  to  serial  correlation;  on  the  other 

hand,  a  heteroskedasticity-robust  version  may  be  warranted. 

An  LM  test  is  easily  computed  if  the  model  is  estimated  by  NLS .   It  too 

tests  the  restrictions  (2.21)  in  model  (2.20),  and  has  no  direct  bearing  the 

consistency  of  C-0  or  NLS  for  6    in  (2.16).   A  simple  example  was  covered  in 

section  2  where  the  common  factor  restrictions  hold  yet  5_  r^  5 . 
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If  the  goal  of  testing  the  static/AR(l)  model  is  to  examine  whether  C-0 
estimates  are  consistent  for  6,    then  a  different  strategy  is  needed.   First, 
it  is  important  to  derive  the  weakest  set  of  conditions  under  which  C-0 
consistently  estimates  6.      Recall  that  u   is  defined  by 

(4.7)  -t^^t  -  E(y,l-,)  -  y,  -  a  -  z^5  ^  x^^. 

Whether  or  not  (2.17)  is  true,  C-0  consistently  estimates  p    in  the  equation 


(4.8) 


E(u  I  u   ,  )  «=  pu   ,  . 
^  t'  t-1^    ^  t-1 


This  is  because  the  first  stage  estimator  of  /3  is  the  OLS  estimator  P,    which 
is  consistent  for  /3  in  (4.7);  then  the  autoregression  of  u   on  u 
consistently  estimates  p  given  by  (4.8).   The  important  step  is  then 
obtaining  an  estimator  of  /9  from  a  regression  on  quasi-differenced  data;  in 
what  follows  let  (/3,p)  be  the  C-0  estimators,  which  may  or  may  not  be 
iterated  after  the  first  quasi-differenced  regression.   Then 


P   = 


C       T        .  -Ij-  T        ^ 

T   y  x'x     T  y  x'y 

'^T  t  t  ^,  t-'t 

*-   t=l    -^   *-  t=l    -' 


where  x  =  x   -  px   ,  and  y  =  y   -  py   ,  .   Straightforward  algebra  shows 
t    t   ^  t-1     ■'t   ^t        ^t-1        ^  ^ 

that 


so  that 


y  =x5+u   -pu 
■^t    t^  t   ^  t- 


1' 


/8  =  /fl  + 


T   y  x'x      T    y  X'  (u   -  pu   ,  ) 

^.,  t  t       ^,  t^  t     t-1 


T     ■>,  -Ir   ,  T 
c'x 

t=l     ^   ^    t=l 

T        ^ -1.       T 

I  x'x       T"^  y 

t=l    -'   '^    t=l 


r   1  ^  -  -  r  ^  r    i  ^ 

=  ^-Tyx'x     Ty(x',u   +x'u   ,) 

-^^  tt  ^,t-it      tt-i 


p   +   Op(l) 


The  last  ecuality  follows  because  E(x'u  )  =  E(x'  ,u   ,)  =  0.   By  stationarity 
J  ^  t  t^       t-1  t-1         ^  ^ 
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and  the  weak  law  of  large  numbers,  ^  ^  /9  if  and  only  if  E[(x   i  "*"  ^   -1)'^  ]  - 
0;  because  E(u  )  -  0  under  (A. 7),  the  condition  reduces  to 

(4.9)  .   E[(2^_^  +  z^^^)'u^]  -  0. 

Thus,  along  with  (4.7),  condition  (4.9)  is  the  one  underlying  consistency  of 
C-0  in  a  static  regression  model.   (Equation  (4.8)  is  taken  to  be 
definitional,  because  the  conditional  expectation  can  always  be  replaced  by 
the  linear  projection  operator.)   Note  that  (4.9)  is  not  the  same  as  the 
exogeneity  condition  (2.19)  unless 

(4.10)  E(z'  ,u  )  -  0 

t-1  t^ 

is  maintained.   In  many  static  regressions  (4.10)  is  assumed  to  be  true, 
otherwise  one  would  probably  estimated  a  DL  model.   If  a  static  model  is 
estimated  under  the  belief  that  there  are  no  DL  dynamics,  then  it  makes  sense 
to  separate  the  hypotheses  (4.10)  and  (2.19).   Violation  of  (4.10)  affects 
ones  interpretation  of  5,    while  violation  of  (2.19)  makes  C-0  generally 
inconsistent  for  6. 

Condition  (4.9)  (or  (2.19))  formally  illustrates  the  point  made  earlier: 
the  common  factor  restrictions  on  the  dynamic  regression  play  no  direct  role 
in  the  consistency  of  C-0.   This  fact  helps  to  explain  why  in  certain 
applications  OLS  estimates  and  C-0  estimates  appear  to  be  close,  even  though 
the  common  factor  restrictions  are  rejected.   Or,  on  the  other  hand,  why  the 
common  factor  restrictions  can  appear  to  be  supported  by  the  data,  yet  C-O 
produces  substantially  different  estimates  of  fi. 

Condition  (4.9)  is  also  the  condition  underlying  the  differencing 
specification  test  proposed  by  Plosser,  Schwert,  and  White  (1983),  which 
compares  the  OLS  coefficients  from  the  regression  in  levels  to  the 
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coefficients  from  a  regression  with  differenced  data.   PSW  simply  set  p  ■  1 
in  computing  /9;  in  other  words,  fi   is  obtained  from  the  regression 

(4.11)  Ay   on  Az  . 

As  shown  above,  p   can  be  set  to  any  number  (provided  the  data  are  stationary 
or  trend- stationary)  or  estimated  by  C-0  (in  which  case  \p\    <   1  with 
probability  approaching  1) ;  the  condition  sufficient  for  consistency  is 
always  (4.9).   A  regression  using  the  quasi-differenced  data  can  be  used  to 
obtain  fi. 

A  test  of  (4.9)  can  be  derived  using  Hausman's  approach  but,  because  the 
C-0  estimator  cannot  be  guaranteed  to  be  more  efficient  than  OLS  under  (4.7) 
and  (4.9),  it  is  easier  to  construct  a  direct  test  based  on  the  OLS  residuals 
(usually  one  would  compute  P   anyway  to  see  if  it  differs  from  fi   in  an 
economically  signifcant  way) .   The  test  procedure  is  to  simply  estimate  the 

A 

model  by  OLS  and  use  \      =   z      ,  +  z   ..as  the  misspecif  ication  indicator  in 
^  t     t-1     t+1  ^ 

Procedure  3.1.   If  the  presence  of  distributed  lag  dynamics  is  a  separate 
hypothesis,  then  (2.19)  should  be  tested  directly  with  \      s  z       If  the 
test  rejects,  the  C-0  estimates  need  not  be  computed  because  they  are 
necessarily  inconsistent;  OLS  should  be  used  to  consistently  estimated  ^, 
and  robust  standard  errors  and  test  statistics  can  be  used  to  perform 
inference . 

The  strength  of  this  approach  is  that  it  tests  only  the  assumption 
needed  for  C-0  to  be  consistent,  and  provides  insight  into  why  the  OLS  and 
Cochrane -Orcutt  estimates  might  be  far  apart.   Unfortunately,  a  failure  to 
reject  only  leads  to  further  questions.   While  a  failure  to  reject  lends 
support  for  (2.19),  one  cannot  be  confident  that  (2.16)  and  (2.17)  hold. 
Thus,  although  C-0  might  be  consistent  for  5,  it  does  not  necessarily  have 
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the  other  desirable  properties  usually  associated  with  it  (being  more 
efficient  than  OLS ,  resulting  in  computationally  simple  test  statistics). 

In  order  to  justify  the  use  of  the  usual  C-0  statistics  (possibly 
corrected  for  neglected  heteroskedasticity) ,  one  should  test  the  validity  of 
the  common  factor  restrictions  as  well  as  (2.19).   Because  (2.15)  and  (2.17) 
are  difficult  to  relax  in  any  useful  way  while  maintaining  the  validity  of 
statistics  from  C-0  estimation,  the  common  factor  tests  are  derived  under 
these  assumptions. 

To  derive  the  LM  test  of  common  factor  restrictions  based  on  the  C-0 
estimates,  let  let  a,  6,    and  p   be  the  Cochrane-Orcutt  estimators  of  q,  6,    and 
p.      The  residuals  from  this  estimation  are 

(4 .  12)  e      =  u      -    pu      ^ 

t     t    '^  t-1 

where  u  =  y      -    x   B,    y     =  y      -    py      .,,  and  x  =  x   -  px   ,  .   (The  first 
t   ■'t    t^-^t   -'t   '^-'t-1'       t    t   ^t-1 

-2  1/2 
observation  can  be  treated  in  the  usual  way;  y..  =  [  1  -  p  ]    y 

,,        -2,1/2      .,    -2,1/2...    .   -  3  -     ,,        -2,1/2-   . 

x^   ^    [I    -    P    ]    '    x^   ^    [1    -    p    ]    '     (l,z^).  ^i   =  Vi    -    \0>    e^  =  [1  -  p  ]  '  u^.) 

The  gradient  of  the- restricted  regression  function  with  respect  to  q,  S,    and 

p   evaluated  at  the  estimates  is 

(4.13)  (x^  -  PX^.^.y,.,  -  x^_^^)  .  (x^,u^.,). 

The  unrestricted  gradient  is  simply  (l,z  ,y   ,,z   ,).   The  standard  LM  test 
6  t'  J  V  .  t  ■'t-l   t-1 

2 
is  obtained  as  TR   from  the  regression 

(4.14)  e    on   1,  z  ,  y  ,  ,    z 

t  t'  ■'t-l'   t-1' 

or  equivalently  from 

e   on   l,z,u   ,,z 
t  t    t-1    t-1 

(When  a  trend  is  included  in  the  original  estimation,  a  trend  is  simply 
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included  in  (4. 14);  just  as  the  common  factor  restriction  on  the  intercept 
cannot  be  tested,  neither  can  those  on  polynomial  trends.)   Under  the 
assumptions  (2.16),  (2.17),  and  conditional  homoskedasticity ,  TR  "*  Xt- 

The  heteroskedasticity- robust  form  is  obtained  by  applying  the  results 
of  Wooldridge  (1990b)  for  nonlinear  regression:   first  regress 

(4.15)  z^_^   on   1,  z^,  u^_^ 

2 
and  save  the  IxJ  residuals,  say  r  .   Then  use  TR  -  T  -  SSR  from  the 

■^   t  u 

regression 

(4.16)  1   on  e  r 

^     '  t  t 

2 
as  asymptotically  Xt  under  H    This  is  completely  analagous  to  steps  (i), 

(ii) ,  and  (iii')  in  section  3.   If  time  trends  are  included  in  the  initial 

estimation,  the  same  functions  of  time  are  included  on  the  right  hand  side  of 

(4.15). 

These  tests  have  immediate  extensions  for  the  finite  distributed  lag 

model  of  order  Q  with  AR(1)  errors: 

(4.17)  y^=a^   z^6q  +  ...  +  z^_q5q  +  u^ ,   E(u^ | z^ . z^_ ^ , . . . )  =  0 

(4.18)  u^=PU^.i+e^,   E(e^|z^.^^_^)  =0. 

Under  (4.17),  the  additional  condition  needed  for  C-0  to  consistently 

estimate  5^ S      is  (2.19)  (note  that  all  lags  of  z  are  assumed  to  be 

uncorrelated  with  u,  so  that  the  analog  of  (4.9)  reduces  to  (2.19)).   Thus, 

the  test  is  carried  out  as  before:   let  u  be  the  residuals  from  the  OLS 

t 


regression 


y   on   1 ,  z  ,  z 
■^t       '   t'   t-. 


,  .  .  .  ,  z 


t-Q- 


let  x^s  (Iz  ,...,z   „),  let  A  =  z      , ,  and  use  these  in  Procedure  3.1. 
t       t      t-Q        t    t+1 
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For  the  common  factor  test,  let  e   •=  u   -  pu   ^  ,  where 

u  ■  V   -Q-z5_-...-z   „6„  and  q,  5»,  ....  5„,  p   are  obtained  from  a 
t    ■'t        t  0  t-Q  Q      '0     '   Q*  ^ 

2 
Cochrane -Orcutt  procedure.   The  usual  LM  test  is  simply  TR  from 

<^-^^)  ^t  °"  ^'   "f  ^t-r  ^-1 "t-Q-i- 

2 
which  is  asymptotically  x  ■,   under  H_  and  conditional  homoskedasticity . 

The  he teroskedasticity- robust  test  obtains  r   as  the  IxJ 

■^  t 

residuals  from  the  regression 

(^^■20)  ^t-Q-1  °"  ^'  ^t ^-Q-  ^-r 

where  z   .  ^  z  .    -    pz       .  ,,  i=0,...,Q,  and  uses  them  as  in  (A. 15).   Again, 
t-j     t-j    ^  t-j-1'  J   •    '^^  K  J  B         , 

any  time  trend  used  in  obtaining  the  C-0  estimates  should  be  used  on  the 
right  hand  side  of  (4.20). 

One  caveat  about  these  tests.   The  test  of  (2.19)  recommended  here  is 
not  the  best  possible  if  (2.15)  and  (2.17)  are  maintained  under  H    In  this 
case,  it  would  be  better  to  base  a  test  on  the  C-0  residuals  rather  than  on 
the  OLS  residuals  (or,  construct  a  Hausman  test  which  directly  compares  5  and 

(5).   But  the  tests  using  X      =  z   ,  in  Procedure  3.1  are  robust  in  that  they 

^   t    t+1  ^ 

take  only  (2.19)  to  be  the  null.   This  test  can  be  used  to  indicate  whether 
C-0  is  leading  one  astray  in  terms  of  parameter  estimates.   The  test  for 
common  factor  restrictions,  which  have  been  derived  under  (2.16)  and  (2.17), 
can  then  be  used  to  check  the  additional  assumptions  required  for  the 
validity  of  the  statistics  based  on  quasi-differenced  data. 

5.  Results  for  Two  Stage  Least  Squares 

Again  consider  the  linear  model 
(5.1)  y^  =  x^^  +  u^,    t=l,2,..., 
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where  x   is  IxK  and  y   and  u   are  scalars.   However,  suppose  now  that  the 

parameters  fi   do  not  index  the  conditional  expectation  E(y  |x  )  or,  more 

traditionally,  that  some  elements  of  x   are  correlated  with  u  .   This  can  be 

the  case  for  a  variety  of  reasons:   (5.1)  might  be  an  equation  in  a 

simultaneous  equations  model  where  x   contains  jointly  determined  variables; 

X  might  contain  proxies  of  the  true  variables  of  interest;  or  (5.1)  might 

omit  variables  that  one  would  like  to  control  for.   In  such  cases  there  is  a 

conditional  expectation  that  one  would  like  to  estimate  but  simultaneity, 

sample  selection,  errors  in  variables,  or  unobserved  variables  makes  it 

impossible  to  do  so  by  an  OLS  regression  of  y   on  x  .  , 

Instead,  let  w  be  a  set  of  IxL  instrumental  variables  chosen  from 
t 

{z    .d)      .,  )  with  L  >  K;  the  restriction  on  w   is 
t   t-1  t 

(5.2)  E(u^|w^)  =  0; 

for  some  of  the  subsequent  analysis  (5.2)  can  be  replaced  by  the  zero 

correlation  assumption  E(w'u  )  =  0,  but  for  clarity  (5.2)  is  assumed  to  be  in 

force  throughout.   The  vector  w  excludes  any  elements  of  z   that  are 

simultaneously  determined  with  y  ,  but  w   would  contain  the  elements  of  x 
^  -'t'       t  t 

that  satisfy  (5.2).   Recall  that  the  2SLS  estimator  of  p   is 
^  ^  (X'X)"  X'Y  =  (X'X)   X'Y 

A    A       -A 

(5.3)  =  /3  +  (X'X)   X'U, 

A  AAA 

where  X  is  the  TxK  matrix  with  t   row  x  ,  and  x   =  w  n  is  the  IxK  vector  of 

t       t    t 

fitted  values  from  the  regression 

x^   on  w^,   t=l T. 

Analogous  to  the  OLS  estimator,  (5.2)  is  the  crucial  condition  for  the  2SLS 
estimator  to  be  consistent  for  fi.      The  errors  u   can  contain  fairly  arbitrary 
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forms  of  heteroskedasticity  and  serial  correlation,  and  x   and  w  can  contain 
lagged  dependent  variables.   Under  standard  regularity  conditions  the  2SLS 
estimator  is  asymptotically  normally  distributed. 

By  stationarity ,  the  coefficients  from  the  linear  projection  of  x   on  w 

*  - 1 

are  time  invariant:   x  ^  w  11,  where  11  ^  [E(w'w  )]   E(w' x  )  .   An  important 

fact  about  the  2SLS  estimator  is  that 

(5.4)  /T(/9  -  /3)  -  (X*'X*/T)"''"T"''-/^X*'U  +  o  (1) 

-1  -1/2  * 
-  A  T   -^  X  'U  +  o  (1) 
P 
*   * 
where  A  ^  E(x  ' x  ).   Equation  (5.4)  shows  that  the  fact  the  fitted  values  x 

have  been  estimated  does  not  affect  the  limiting  distribution  of  the  2SLS 

estimator;  the  same  limiting  distribution  is  obtained  if  x   is  replaced  by 

X  ,  i.e.  if  n  replaces  H.   This  makes  it  easy  to  obtain  consistent  standard 

errors  in  a  variety  of  circumstances.   In  the  general  case,  the  asymptotic 

-1   -1 
covariance  matrix  of  /T(;9  -  y9)  is  given  by  V  -  A   BA   where  now 

CO 

(5.5)  B  s  E(s*'s*)  +   y  [E(s*'  .s*)  +  E(s*'s''  .)]  , 

t   t^    .^^  ^  t+j  t^  t      t+j^-'' 

and  s  =  X  u    . 
t    t  t 

In  the  context  of  2SLS,  the  assumption  of  no  serial  correlation  is  most 
easily  stated  as 

(5.6)  E(u   .u  Iw   .  ,w  )  =  0,  i  >  1. 

t+j  t '  t+j   t       -^ 

In  (5.6),  w   can  be  replaced  by  x   with  the  subsequent  results  going  through. 
Technically,  this  allows  for  certain  forms  of  serial  correlation  ruled  out  by 
(5.6),  but  the  additional  generality  is  quite  modest. 
The  appropriate  homoskedascicity  assumption  is 

(5.7)  E(uJ|w^)  =  o^,      t=l,2 


41 


which  imposes  homoskedasticity  of  u   conditional  on  the  instruments  w  . 
Again,  x   can  replace  w   in  stating  (5.7). 

Under  (5.6)  and  (5.7),  the  usual  asymptotic  covariance  matrix  estimator 


SLS 


is  consistent  for  avar  /T(^  -  /3)  .   Let  u  =  y   -  x   B   denote  the  2 

2 
residuals.   Then  a  consistent  estimator  of  a      is  given  by 

T 

o^   -  (T-K)-^  I   u^; 
t-1 

the  degrees  of  freedom  adjustment  does  not  make  a  difference  asymptotically, 

A 

and  is  used  by  most  regression  packages.   The  asymptotic  standard  error  of  fi. 
is  the  square  root  of  the  j th  diagonal  element  of 

A  Q    A    A      ^ 

CT  (X'X)    . 

This  is  what  is  printed  out  by  all  regression  packages. 

In  the  present  context,  dynamic  completeness  is  defined  by 
(5.8)  E(u^|w^,^^_^)  =  0, 

where  <j)  contains  all  past  values  of  w,  x,  and  y.   As  with  the  case  of  OLS , 

(5.8)  can  be  shown  to  imply  (5.6)  by  a  standard  application  of  the  law  of 
iterated  expectations.   Setting 

A  AAA  -Xa^AA 

B  -  X'SX/(T-K)  =  (T-K)"  I   u  x'x 

t=l 

^2  "2 

where  2  =  diag(u  , . . . ,u  ) ,  a  heteroskedasticity-robust  covariance  matrix 

estimator  is 

A  AA  -AAA  AA  -. 

(5.9)  V=  (X'X/T)   [X'SX/(T-K)](X'X/T)   , 

A 

and  the  asymptotic  standard  error  of  ;9.  is  the  square  root  of  the  j th 

A 

diagonal  element  of  V. 

For  the  general  case  that  (w'u  )  might  be  serially  correlated  and 

2 
E(u  |w  )  nonconstant,  let 
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A  -Xaa  a.  aa 

Q.=T    Ys's.,    s-xu, 
and  compute  an  estimator  of  B  by  (2.22).   The  only  difference  between  this 

A 

case  and  the  OLS  case  is  that  x  has  been  replaced  by  x   everywhere  except 

A 

(as  usual  in  2SLS  contexts)  in  the  computation  of  u    The  asymptotic 

A 

variance  estimator  of  P   is  still  given  by 

A  AA^AAA- 

V/T  ^  (X'X)   (TB)(X'X)'  . 

Procedure  3.1  has  an  immediate  generalization  for  computing  a 
heteroskedasticity-serial  correlation  robust  standard  error  for  p.: 

PROCEDURE  5.1: 

A  A 

(i)  Estimate  fi   by  2SLS  using  instruments  w  .   This  yields  "se(^.)", 

A  A  A 

a,    and  the  2SLS  residuals  (u  : t=l T).   Obtain  the  fitted  values  x   from 

the  first  step  regression 


X   on  w  . 

t       t 


(ii)  Compute  the  residuals  (r  .:  t=l,...,T)  from  the  regression 

A  AAA  A 

(5.10)  X  .   on  X  ,  ,  .  .  .  ,x   .  ,  ,x   .,,...  ,x  ,,,   t=l T 

tj        tl'    '  t,j-l'  t,j+l'    '  tK'      '    ' 

(iii)  Set  £   =  r  .u   and  run  the  regression 
^t    tj  t  ^ 

(5.11)  e      on    e    ,,...,  e    ^, 

1/4 
where  G  is,  say,  the  integer  part  of  T    .   Compute  the  spectral  density 

estimator 

Cj  -  [T/(T-K)]r^/(1  -  a^  -  a^  -  ...  -  a^)^ 

A 

where  a.,  j=l,...,G,  are  the  OLS  coefficients  from  the  autoregression  (5.11) 

"2 
and  T      is  the  square  of  the  standard  error  of  the  regression. 
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(iv)  Compute  se(/3.)  from 
(5.12)  se(^^)  -  ["se(p.)"/a]^(Tc.)'^^^^ 


The  standard  error  from  Procedure  5.1  is  both  heteroskedasticity  and  (as  T  -+ 

«>)  serial  correlation  robust.   Showing  that  this  produces  a  consistent 

standard  error  follows  along  the  lines  of  Uooldridge  (1990b). 

Regression-based  specification  tests  require  only  a  slight  modification 

from  the  OLS  case.   As  in  Section  3,  let  V  denote  a  set  of  "exogenous" 

and/or  predetermined  variables  from  (z    ,<j)        ).   Elements  of  x   that  are 

correlated  with  u   are  excluded  by  definition,  but  V   can  contain  elements 

from  w   and  other  variables  from  6      ,  that  should  be  uncorrelated  with  u 
t  ^t-1  t 

under  H_ .   The  null  hypothesis  is  taken  to  be 

(5.13)  Hq:  E(u^|w^,V^)  =  0. 

Let  X(Tp    ,r])    be  a  IxQ  vector  of  misspecif ication  indicators  with  nuisance 
parameter  estimator  r] .      The  test  of  (5.18)  is  based  on 

i    A    /*. 


T'^  I  A'u  . 

^,  t  t' 


t=l 
where  A  =   A(^  .rj)  and  f?  is  a  nuisance  parameter  estimator.   The  general 

H/SC-robust  procedure  is  an  immediate  extension  from  Procedure  3.1. 

PROCEDURE  5.2: 

(i)  Obtain  u   as  the  residuals  from  the  2SLS  regression 

y   on  X   using  instruments  w  . 
■'t       t      ^  t 

Compute  the  IxK  fitted  values  x   from  the  first  stage  regression  of  x   on  w  , 

or  from  x   on  w  ,  A  . 

t     t    t 

A 

(ii)  Obtain  r   as  the  IxQ  vectors  of  residuals  from  the  regression 
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\        on  X  . 

c       t 


(iii)  Define  ^   to  be  the  IxQ  vector  ^   =  u  r  .   Save  the  IxQ 

A 

residuals  v      from  the  VAR(G)  regression 

A  A  A 

^t   °"   ^t-1 ^-G- 

2 
(iv)  Use  TR  -  T  -  SSR  from  the  regression 

A 

1   on  1/  ; 
t 

T  can  be  the  actual  number  of  observations  used  in  this  final  regression.   " 

A 

The  choice  of  whether  to  compute  x   from  the  regression  of  x   on  w   or  x   on 

A 

w   and  A   depends  on  what  is  assumed  about  E(x  Iw  ,A  )  under  H. .   To  see  the 
t  t   '^  t '  t   t         0 

issue,  note  that 

(5.14)        T-^/2  ^  I      ^   ^-1/2  ^  ;  ;   ^  ^-1/2  ^  ^y^  _  ^^;^^l^    .  ;^c) 

t=l  t=l  t-l 


where 


A  A    A       -  A    A 

C  ^  (X'X)   X' A. 


Underlying  Procedure  5.2  is  the  assumption 

T  /^  T 

(5.15)         T"^/2  I  S=   T"^/^  I    (y   -  X  fl)(A   -  x*C)  +  o  (1) , 

t=l  t=l  ^ 

*  "  *  *   -1   * 

where  x   are  the  population  analogs  of  x   and  C  =  [E(x  'x  )]   E(x  'A  );  in 
t         ^    ^  ^  t  ^tt-'tt 

other  words,  each  estimator  implicit  in  |   can  be  replaced  by  its  plim 
without  altering  the  asymptotic  distribution  of  its  standardized  partial  sum. 
This  was  shown  to  always  be  the  case  for  the  OLS  tests  in  Section  3.   As  in 

A 

section  3,  the  fact  that  A   is  estimated  does  not  affect  the  limiting 

-1/2  '^  ^ 
distribution  of  T     ^  ^   under  H    Also,  by  the  first  order  condition  for 

t=l 
the  2SLS  estimator, 
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T"^/^  iL  -   T'^/^  I   (y.  -  xi)(I  -  X  C), 

t-1  ^  t-1    ^       ^       ^       ^ 


so 


T'^^^  I  L  =  T-^/2  X  (y^  -  x^^)(A^  -  x^C)  +  o  (1) . 
t=l  t=l  ^ 

But 

T  T 

T"^/^  Z  (y   -  xJ)(A^  -  x^C)  =  T"^/^  X  (y^  -  x^^)(A^  -  x*C) 
t-1  t-1 

T 
+  T"^/^  I   (y   -  X  fl)w  (n  -  n) 

t=l 

T 
=  T'^/2  I    (y^  -  x^^)(A^  -  x*C)  +  0  (l)-o  (1) 
t=l  ^     ^ 

under  H^.   Consequently,  (5.15)  holds  if  it  can  be  shown  that 

T         A  T 

T"'^'^^  I  (y^  -  X  5)  (A   -  x*C)  =  T'^/^  I    (y   -  x^^)(A^  -  x*C)  +  o  (1) 
t=l  t=l  ^ 

under  H     Because  /T(^  -  ;3)  -  0  (1)  and  E[(A   -  x  C)'u  ]  =  0  under  H    it 

suffices  to  show  that 

T 

(5.16)  T"^  I   x' (A   -  x*C)  B  0; 

t=l 

by  the  WLLN,  (5.16)  holds  provided 

(5.17)  E[x' (A   -  x*C)  =  0. 
\           ^  ^  t^  t    t  ^ 

Suppose  now  that 

(5.18)  X*  =  E(x  Iw  ,A  )  . 

t     ^  t'  t'  t^ 

Then  the  law  of  iterated  expectations  implies  that 

Efx'  (A   -  x*C)]  =  E[E(x  Iw  ,A  )'(A   -  x  C)  ] 
'■  t^  t     t   ■'     ^    t'  t   t     t     t      ' 

(5.19)  =  E[x*'  (A   -  x'^'c)] 

^  t  ^  t     t  ^  •■ 

=  0 
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by  definition  of  C,  verifying  (5.17).   Thus,  the  validity  of  Procedure  5.2 
generally  requires  x   to  be  the  predicted  values  from  the  population 
regression  of  x  on  w  and  A  .   However,  in  many  cases,  E(x^|w^,A^)  - 
E(x  Ix  )  -  w  n  under  H.  (otherwise  A  would  already  be  in  the  instrument 
list),  in  which  case  x   can  be  obtained  from  the  regression  of  x   on  w     If 
there  is  any  doubt  whether  A  has  additional  predictive  power  for  x  under 
H„,  then  the  regression  of  x   on  w  ,  A   can  be  used  to  obtain  x  . 

If  the  null  hypothesis  imposes  (5.8)  then  then  ^(u   u  |  w    ,  V"   .  ,  w  ,  Vi  ) 
-  0  for  any  V  C    (v    ,4>      .,  )  ,  and  the  test  need  not  be  made  robust  to  serial 
correlation.   A  heteroskedasticity- robust  test  is  obtained  by  replacing  (iii) 
and  (iv)  by 

(iii' )  Regress 

(5.20)  1   on  ?^.  t=l T 

2 
and  use  T  -  SSR  as  asymptotically  x^.- 

2  2 

A  test  which  imposes  E(u  |w  ,t/)  )  =  a   and  E(u   .u  |  w   .  ,  V"   .  ,  w  ,  V^  )  =  0  is 

L     L     U  _J  J        J 

2 
obtained  as  TR   from  the  regression 
u 

(5.21)  u   on  x  ,  A  . 

t       t    t 

Both  (5.21)  and  (5.20)  require  (5.18). 

EXAMPLE  5.1  (Testing  For  Omitted  Variables):   Consider  the  model 

(5.22)  yt  =  \l^l^^2^2  ^"f 
where  the  null  hypothesis  is 

Both  X  T  and  x  „  can  contain  elements  correlated  with  u  .   In  general,  the 
tl       t2  t       '^ 

list  of  valid  instruments  can  change  under  the  null  and  alternative  models. 
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For  example,  if  x  .  contains  lagged  endogenous  variables  then,  in  many  cases, 
these  lags  would  not  be  used  as  instruments  if  H.  were  true.   Let  w  be  a  IxL 
set  of  valid  instruments  under  H_ .   Assume  that  L  >  K  ^  K  +  K     Let  w   be 
a  IxL.,  subvector  of  w   such  that  E(x   |w  )  -  w  ,11-.;  assume  that  L  ^  K    Let 

A  A 

/9.,  and  u   denote  the  2SLS  statistics  obtained  using  instruments  w   under  the 

A 

restriction  H_:  i9„  ■=  0,  and  let  x  ,  denote  the  fitted  values  from  the  first 
0   ^2     '  tl 

step  regression  x  ,  on  w  , .   (If  x  ,  is  exogenous  then  w  ,  -  x  ,  and  x  ,  = 
^        ^  tl     tl    ^     tl       ^  tl    tl      tl 

A  A 

X I  ) .   Let  X  „  be  the  fitted  values  from  the  regression  x  „  on  w    Then  u 

A  A 

and  \      =  X  „  can  be  used  in  Procedure  5.2.   What  is  really  being  tested  is 
t    t2  y  B 

_^    i    A 

whether  plim  T    Z  ^' o'-'   "=  0 .   If  the  test  is  intended  to  detect  dynamic 
t=l  ^   ^ 

misspecif ication  then  (5.20)  or  (5.21)  can  be  used  according  to  whether  or 
not  homoskedasticity  is  maintained.   * 

A  A 

EXAMPLE  5.2  (Testing  for  Serial  Correlation):   Let  u   and  x  be  obtained  from 
2SLS  estimation  of 

y   on  X   using  IV' s  w  . 

A  A  A 

A  test  for  AR(Q)  serial  correlation  is  obtained  by  using  A   s  (u   , , . . . ,u    ) 
in  (i),  (ii),  and  (iii' )  or  in  regression  (5.21)  (not  robust  to 
heteroskedasticity) .   If  u     ...u    add  explanatory  power  to  x  under  H 

A  /\  A 

then  X   should  be  obtained  from  x   onw.u   ,,..., u   ^.   * 
t  t      t'   t-1'     '   t-Q 

EXAMPLE  5.3  (Testing  for  Endogeneity) :   Let  the  model  be  partioned  as  in 

(5.22),  where  x  -  is  taken  to  be  exogenous.   The  issue  is  whether  x  _  is 
tl  ^  t2 


endogenous : 


Hq:  E(ujx^2)  =  0- 


The  model  is  estimated  by  OLS  under  H„ ,  so  let  u   denote  the  OLS  residuals 

^  0'         t 

from  the  regression  y   on  x  ,  ,  x  ^ .   If  w  denotes  a  set  of  instruments  that 
^        -'t     tl'   t2       t 
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includes  x  ,  but  not  x  _  then  a  Hausman  test  which  compares  the  OLS  and  2SLS 
tl  t2  ^ 

estimators  can  be  shown  to  be  based  on  the  sample  covariance 

.     i.    A       A 

T    7  X'  u  , 
t-1  ^^    ^ 

re  the  x  _  are  the  fitted  values  from  the  first  stage  regression  x  „  on 
t2  ^    "         t2 


whe 


w  .   Thus,  take  x  ^  x   and  A   =  x  .  in  Procedure  5.2;  the  degrees  of  freedom 
t        '        t     t       t     t2  .6 

of  the  test  is  K„ ,  the  dimension  of  x  ..   A  test  which  assumes 

2 
horaoskedasticity  and  no  serial  correlation  is  based  on  TR   from  the 
■^  u 

regression 

A  A 

uonx^.x^.x^; 
t       tl'   t2'   t2' 

see  Hausman  (1983).   Steps  (i)-(iv)  or  (i)-(iii')  can  be  used  to  obtain 

robust  versions.   Note  that,  because  the  null  model  is  estimated  by  OLS,  this 

test  also  falls  under  Procedure  3.1.   " 

EXAMPLE  5.4  (Testing  Overidentifying  Restrictions):   Let  the  model  be 

y  =  X  /3  +  u  , 
-'t    V  t' 

A 

where  x   is  IxK.   Let  w  be  a  IxL  vector  of  instruments,  where  L  >  K.   If  u 
t  t  t 

A 

denotes  the  2SLS  residuals  y   -  x  yS,  a  test  of  overidentifying  restrictions, 

which  assximes  horaoskedasticity  and  no  serial  correlation  under  the  null,  is 

2  ^2  2 

obtained  as  TR   from  the  regression  u   on  w  ;  TR   is  asymptotically  y^  , 
u  ^         ttu      ^    ^  -'^Q' 

A  A  A 

where  Q  s  L  -  K.   Procedure  5.2  is  applied  by  taking  x   =  w  n  and  A   any  of  Q 
elements  from  w   that  are  not  also  elements  of  x  .   This  produces  an 
H/CS-robust  test  of  the  overidentifying  restrictions.   Steps  (i) ,  (ii)  ,  and 
(iii' )  produce  the  heteroskedasticity-robust  form.   " 
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6.  Concluding  Remarks 

The  procedures  suggested  in  this  paper  offer  relatively  simple  methods 
for  carrying  out  inference  in  linear  time  series  models  that  are  robust  to 
fairly  arbitrary  forms  of  serial  correlation  and  heteroskedasticity .   The 
standard  errors  and  test  statistics  discussed  in  sections  2-5  are 
alternatives  to  more  popular  methods  which  model  serial  correlation  in  the 
errors  and  impose  certain  exogeneity  assumptions  on  the  regressors .   The 
H/SC-robust  forms  of  the  test  statistics  require  only  very  weak  assumptions 
on  the  errors . 

The  observation  that  the  very  weak  requirement  E(u  |x  )  =  0  (OLS)  or 
E(u  |w  )  =  0  (2SLS)  suffices  for  consistency  (along  with  regularity 
conditions)  raises  an  interesting  question  which  has  not  received  much 
attention  lately.   Namely,  what  exactly  should  be  required  of  the  errors  in 
time  series  models?   If  the  errors  should  only  be  required  to  be  uncorrelated 
with  the  regressors  (OLS)  or  instruments  (2SLS) ,  then  the  methods  of  this 
paper  have  significant  robustness  advantages  over  more  traditional  serial 
correlation  modelling  approaches.   If  "correct  specification"  requires  that 
the  errors  be  serially  uncorrelated  (unforecastable) ,  then  many  static  and 
distributed  lag  models  are  necessarily  misspecif led.   By  this  criterion  most 
time  series  regressions  would  need  to  contain  lags  of  dependent  as  well  as 
lags  of  conditioning  variables. 

As  mentioned  in  the  introduction,  many  approaches  to  economic  modelling 
do  not  allow  one  to  address  the  question  about  what  should  be  required  of  the 
errors.   Most  of  the  conditions  imposed  on  the  errors  have  arisen  out  of 
statistical  considerations.   In  the  context  of  the  linear  model,  the  no 
serial  correlation  assumption  (2.10)  (along  with  the  homoskedasticity 


50 


assumption  (2.11))  validates  the  usual  OLS  test  statistics,  at  least 
asymptotically.   The  static  model  with  AR(1)  errors,  given  by  (2.16)  and 
(2.17),  originated  primarily  to  obtain  standard  errors  and  test  statistics 
with  the  usual  properties;  it  also  produces  an  estimator  which  is 
asymptotically  more  efficient  than  OLS. 

It  was  some  time  later  that  econometricians  realized  that  (2.16)  and 
(2.17)  impose  common  factor  restrictions  on  the  dynamic  regression.   This 
paper  has  further  emphasized  the  additional  exogeneity  restriction  (2.19). 
It  seems  useful  to  seek  conditions  on  the  errors  in  static  and  distributed 
lag  models  that  have  economic  content,  rather  than  being  motivated  by 
statistical  considerations.   One  candidate  approach  is  rational  expectations, 
which  imposes  unforecastability  given  a  certain  information  set. 
Unfortunately,  many  static  relationships  are  estimated  without  appealing  at 
all  to  rational  expectations.   A  broader  set  of  criteria  is  needed.   One 
possible  requirement,  that  seems  to  not  have  appeared  in  the  literature,  is 
that  u  be  Granger  causally  prior  to  z ,  i.e. 

(6.1)  E(u  lu   ,,z   , ,u   ^,z   ^...)  =  E(u  lu   ,,u   ^ ). 

^  t'  t-1'  t-1'  t-2   t-2    '  ^    t'  t-1'  t-2'    ' 

Assuming  linear  expectations  and  first  order  dynamics,  this  is  the  same  as 

(6.2)       E(^l^-r^t-r"t-2'^-2--->  =  ^^-1- 

Because  the  static/AR(l)  model  implies  that 

(6.3)  E(uJz^,u^_^,z^_^.u^_2-^.2---)  =  ^\-l' 

(6.2)  is  generally  weaker  than  the  assumptions  underlying  the  static/AR(l) 
model,  unless  (z  )  is  assumed  to  be  strictly  exogenous.   Examining  the 
implications  of  and  how  to  test  conditions  like  (6.1)  deserves  further 
research,  but  is  beyond  the  scope  of  the  current  paper. 
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